cuDF API Reference

DataFrame

class cudf.core.dataframe.DataFrame(data=None, index=None, columns=None, dtype=None)

A GPU Dataframe object.

Parameters
dataarray-like, Iterable, dict, or DataFrame.

Dict can contain Series, arrays, constants, or list-like objects.

indexIndex or array-like

Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided.

columnsIndex or array-like

Column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.

dtypedtype, default None

Data type to force. Only a single dtype is allowed. If None, infer.

Examples

Build dataframe with __setitem__:

>>> import cudf
>>> df = cudf.DataFrame()
>>> df['key'] = [0, 1, 2, 3, 4]
>>> df['val'] = [float(i + 10) for i in range(5)]  # insert column
>>> df
   key   val
0    0  10.0
1    1  11.0
2    2  12.0
3    3  13.0
4    4  14.0

Build DataFrame via dict of columns:

>>> import numpy as np
>>> from datetime import datetime, timedelta
>>> t0 = datetime.strptime('2018-10-07 12:00:00', '%Y-%m-%d %H:%M:%S')
>>> n = 5
>>> df = cudf.DataFrame({
...     'id': np.arange(n),
...     'datetimes': np.array(
...     [(t0+ timedelta(seconds=x)) for x in range(n)])
... })
>>> df
    id                datetimes
0    0  2018-10-07T12:00:00.000
1    1  2018-10-07T12:00:01.000
2    2  2018-10-07T12:00:02.000
3    3  2018-10-07T12:00:03.000
4    4  2018-10-07T12:00:04.000

Build DataFrame via list of rows as tuples:

>>> df = cudf.DataFrame([
...     (5, "cats", "jump", np.nan),
...     (2, "dogs", "dig", 7.5),
...     (3, "cows", "moo", -2.1, "occasionally"),
... ])
>>> df
   0     1     2     3             4
0  5  cats  jump  <NA>          <NA>
1  2  dogs   dig   7.5          <NA>
2  3  cows   moo  -2.1  occasionally

Convert from a Pandas DataFrame:

>>> import pandas as pd
>>> pdf = pd.DataFrame({'a': [0, 1, 2, 3],'b': [0.1, 0.2, None, 0.3]})
>>> pdf
   a    b
0  0  0.1
1  1  0.2
2  2  NaN
3  3  0.3
>>> df = cudf.from_pandas(pdf)
>>> df
   a     b
0  0   0.1
1  1   0.2
2  2  <NA>
3  3   0.3
Attributes
T

Transpose index and columns.

at

Alias for DataFrame.loc; provided for compatibility with Pandas.

columns

Returns a tuple of columns

dtypes

Return the dtypes in this object.

empty

Indicator whether DataFrame or Series is empty.

iat

Alias for DataFrame.iloc; provided for compatibility with Pandas.

iloc

Selecting rows and column by position.

index

Returns the index of the DataFrame

loc

Selecting rows and columns by label or boolean mask.

ndim

Dimension of the data.

shape

Returns a tuple representing the dimensionality of the DataFrame.

size

Return the number of elements in the underlying data.

values

Return a CuPy representation of the DataFrame.

Methods

acos()

Get Trigonometric inverse cosine, element-wise.

add(other[, axis, level, fill_value])

Get Addition of dataframe and other, element-wise (binary operator add).

agg(aggs[, axis])

Aggregate using one or more operations over the specified axis.

all([axis, bool_only, skipna, level])

Return whether all elements are True in DataFrame.

any([axis, bool_only, skipna, level])

Return whether any elements is True in DataFrame.

append(other[, ignore_index, …])

Append rows of other to the end of caller, returning a new object.

apply_chunks(func, incols, outcols[, …])

Transform user-specified chunks using the user-provided function.

apply_rows(func, incols, outcols, kwargs[, …])

Apply a row-wise user defined function.

argsort([ascending, na_position])

Sort by the values.

as_gpu_matrix([columns, order])

Convert to a matrix in device memory.

as_matrix([columns])

Convert to a matrix in host memory.

asin()

Get Trigonometric inverse sine, element-wise.

assign(**kwargs)

Assign columns to DataFrame from keyword arguments.

astype(dtype[, copy, errors])

Cast the DataFrame to the given dtype

atan()

Get Trigonometric inverse tangent, element-wise.

clip([lower, upper, inplace, axis])

Trim values at input threshold(s).

copy([deep])

Make a copy of this object’s indices and data.

corr()

Compute the correlation matrix of a DataFrame.

cos()

Get Trigonometric cosine, element-wise.

count([axis, level, numeric_only])

Count non-NA cells for each column or row.

cov(**kwargs)

Compute the covariance matrix of a DataFrame.

cummax([axis, skipna])

Return cumulative maximum of the DataFrame.

cummin([axis, skipna])

Return cumulative minimum of the DataFrame.

cumprod([axis, skipna])

Return cumulative product of the DataFrame.

cumsum([axis, skipna])

Return cumulative sum of the DataFrame.

describe([percentiles, include, exclude, …])

Generate descriptive statistics.

div(other[, axis, level, fill_value])

Get Floating division of dataframe and other, element-wise (binary operator truediv).

drop([labels, axis, index, columns, level, …])

Drop specified labels from rows or columns.

drop_duplicates([subset, keep, inplace, …])

Return DataFrame with duplicate rows removed, optionally only considering certain subset of columns.

dropna([axis, how, thresh, subset, inplace])

Drops rows (or columns) containing nulls from a Column.

equals(other)

Test whether two objects contain the same elements.

exp()

Get the exponential of all elements, element-wise.

explode(column[, ignore_index])

Transform each element of a list-like to a row, replicating index values.

fillna([value, method, axis, inplace, limit])

Fill null values with value or specified method.

floordiv(other[, axis, level, fill_value])

Get Integer division of dataframe and other, element-wise (binary operator floordiv).

from_arrow(table)

Convert from PyArrow Table to DataFrame.

from_pandas(dataframe[, nan_as_null])

Convert from a Pandas DataFrame.

from_records(data[, index, columns, nan_as_null])

Convert structured or record ndarray to DataFrame.

hash_columns([columns])

Hash the given columns and return a new device array

head([n])

Returns the first n rows as a new DataFrame

info([verbose, buf, max_cols, memory_usage, …])

Print a concise summary of a DataFrame.

insert(loc, name, value)

Add a column to DataFrame at the index specified by loc.

interleave_columns()

Interleave Series columns of a table into a single column.

isin(values)

Whether each element in the DataFrame is contained in values.

isna()

Identify missing values.

isnull()

Identify missing values.

iteritems()

Iterate over column names and series pairs

join(other[, on, how, lsuffix, rsuffix, …])

Join columns with other DataFrame on index or on a key column.

keys()

Get the columns.

kurt([axis, skipna, level, numeric_only])

Return Fisher’s unbiased kurtosis of a sample.

kurtosis([axis, skipna, level, numeric_only])

Return Fisher’s unbiased kurtosis of a sample.

label_encoding(column, prefix, cats[, …])

Encode labels in a column with label encoding.

log()

Get the natural logarithm of all elements, element-wise.

mask(cond[, other, inplace])

Replace values where the condition is True.

max([axis, skipna, level, numeric_only])

Return the maximum of the values in the DataFrame.

mean([axis, skipna, level, numeric_only])

Return the mean of the values for the requested axis.

melt(**kwargs)

Unpivots a DataFrame from wide format to long format, optionally leaving identifier variables set.

memory_usage([index, deep])

Return the memory usage of each column in bytes.

merge(right[, on, left_on, right_on, …])

Merge GPU DataFrame objects by performing a database-style join operation by columns or indexes.

min([axis, skipna, level, numeric_only])

Return the minimum of the values in the DataFrame.

mod(other[, axis, level, fill_value])

Get Modulo division of dataframe and other, element-wise (binary operator mod).

mode([axis, numeric_only, dropna])

Get the mode(s) of each element along the selected axis.

mul(other[, axis, level, fill_value])

Get Multiplication of dataframe and other, element-wise (binary operator mul).

nans_to_nulls()

Convert nans (if any) to nulls.

nlargest(n, columns[, keep])

Get the rows of the DataFrame sorted by the n largest value of columns

notna()

Identify non-missing values.

notnull()

Identify non-missing values.

nsmallest(n, columns[, keep])

Get the rows of the DataFrame sorted by the n smallest value of columns

one_hot_encoding(column, prefix, cats[, …])

Expand a column with one-hot-encoding.

partition_by_hash(columns, nparts[, keep_index])

Partition the dataframe by the hashed value of data in columns.

pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

pivot(index, columns[, values])

Return reshaped DataFrame organized by the given index and column values.

pop(item)

Return a column and drop it from the DataFrame.

pow(other[, axis, level, fill_value])

Get Exponential power of dataframe and other, element-wise (binary operator pow).

prod([axis, skipna, dtype, level, …])

Return product of the values in the DataFrame.

product([axis, skipna, dtype, level, …])

Return product of the values in the DataFrame.

quantile([q, axis, numeric_only, …])

Return values at the given quantile.

quantiles([q, interpolation])

Return values at the given quantile.

query(expr[, local_dict])

Query with a boolean expression using Numba to compile a GPU kernel.

radd(other[, axis, level, fill_value])

Get Addition of dataframe and other, element-wise (binary operator radd).

rank([axis, method, numeric_only, …])

Compute numerical data ranks (1 through n) along axis.

rdiv(other[, axis, level, fill_value])

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

reindex([labels, axis, index, columns, copy])

Return a new DataFrame whose axes conform to a new index

rename([mapper, index, columns, axis, copy, …])

Alter column and index labels.

repeat(repeats[, axis])

Repeats elements consecutively.

replace([to_replace, value, inplace, limit, …])

Replace values given in to_replace with replacement.

reset_index([level, drop, inplace, …])

Reset the index.

rfloordiv(other[, axis, level, fill_value])

Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).

rmod(other[, axis, level, fill_value])

Get Modulo division of dataframe and other, element-wise (binary operator rmod).

rmul(other[, axis, level, fill_value])

Get Multiplication of dataframe and other, element-wise (binary operator rmul).

round([decimals])

Round a DataFrame to a variable number of decimal places.

rpow(other[, axis, level, fill_value])

Get Exponential power of dataframe and other, element-wise (binary operator pow).

rsub(other[, axis, level, fill_value])

Get Subtraction of dataframe and other, element-wise (binary operator rsub).

rtruediv(other[, axis, level, fill_value])

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

sample([n, frac, replace, weights, …])

Return a random sample of items from an axis of object.

scatter_by_map(map_index[, map_size, keep_index])

Scatter to a list of dataframes.

searchsorted(values[, side, ascending, …])

Find indices where elements should be inserted to maintain order

select_dtypes([include, exclude])

Return a subset of the DataFrame’s columns based on the column dtypes.

set_index(keys[, drop, append, inplace, …])

Return a new DataFrame with a new index

shift([periods, freq, axis, fill_value])

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

skew([axis, skipna, level, numeric_only])

Return unbiased Fisher-Pearson skew of a sample.

sort_index([axis, level, ascending, …])

Sort object by labels (along an axis).

sort_values(by[, axis, ascending, inplace, …])

Sort by the values row-wise.

sqrt()

Get the non-negative square-root of all elements, element-wise.

stack([level, dropna])

Stack the prescribed level(s) from columns to index

std([axis, skipna, level, ddof, numeric_only])

Return sample standard deviation of the DataFrame.

sub(other[, axis, level, fill_value])

Get Subtraction of dataframe and other, element-wise (binary operator sub).

sum([axis, skipna, dtype, level, …])

Return sum of the values in the DataFrame.

tail([n])

Returns the last n rows as a new DataFrame

take(positions[, keep_index])

Return a new DataFrame containing the rows specified by positions

tan()

Get Trigonometric tangent, element-wise.

tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

to_arrow([preserve_index])

Convert to a PyArrow Table.

to_csv([path_or_buf, sep, na_rep, columns, …])

Write a dataframe to csv file format.

to_dlpack()

Converts a cuDF object into a DLPack tensor.

to_feather(path, *args, **kwargs)

Write a DataFrame to the feather format.

to_hdf(path_or_buf, key, *args, **kwargs)

Write the contained data to an HDF5 file using HDFStore.

to_json([path_or_buf])

Convert the cuDF object to a JSON string.

to_orc(fname[, compression])

Write a DataFrame to the ORC format.

to_pandas([nullable])

Convert to a Pandas DataFrame.

to_parquet(path, *args, **kwargs)

Write a DataFrame to the parquet format.

to_records([index])

Convert to a numpy recarray

to_string()

Convert to string

transpose()

Transpose index and columns.

truediv(other[, axis, level, fill_value])

Get Floating division of dataframe and other, element-wise (binary operator truediv).

unstack([level, fill_value])

Pivot one or more levels of the (necessarily hierarchical) index labels.

update(other[, join, overwrite, …])

Modify a DataFrame in place using non-NA values from another DataFrame.

var([axis, skipna, level, ddof, numeric_only])

Return unbiased variance of the DataFrame.

where(cond[, other, inplace])

Replace values where the condition is False.

groupby

rolling

property T

Transpose index and columns.

Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. The property T is an accessor to the method transpose().

Returns
outDataFrame

The transposed DataFrame.

acos()

Get Trigonometric inverse cosine, element-wise.

The inverse of cos so that, if y = x.cos(), then x = y.acos()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.acos()
0    3.141593
1    1.570796
2    0.000000
3    1.240482
4    1.047198
dtype: float64

acos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.acos()
      first    second
0  3.141593  1.334606
1  1.570796  1.266104
2  1.047198  1.470629

acos operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.acos()
Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0,
            1.5707963267948966,  1.266103672779499],
            dtype='float64')
add(other, axis='columns', level=None, fill_value=None)

Get Addition of dataframe and other, element-wise (binary operator add).

Equivalent to dataframe + other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, radd.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
DataFrame

Result of the arithmetic operation.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df + 1
        angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
        angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
agg(aggs, axis=None)

Aggregate using one or more operations over the specified axis.

Parameters
aggsIterable (set, list, string, tuple or dict)
Function to use for aggregating data. Accepted types are:
  • string name, e.g. "sum"

  • list of functions, e.g. ["sum", "min", "max"]

  • dict of axis labels specified operations per column, e.g. {"a": "sum"}

axisnot yet supported
Returns
Aggregation ResultSeries or DataFrame

When DataFrame.agg is called with single agg, Series is returned. When DataFrame.agg is called with several aggs, DataFrame is returned.

Notes

Difference from pandas:
  • Not supporting: axis, *args, **kwargs

all(axis=0, bool_only=None, skipna=True, level=None, **kwargs)

Return whether all elements are True in DataFrame.

Parameters
skipna: bool, default True

Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be True, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.

Returns
Series

Notes

Parameters currently not supported are axis, bool_only, level.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [3, 2, 3, 4], 'b': [7, 0, 10, 10]})
>>> df.all()
a     True
b    False
dtype: bool
any(axis=0, bool_only=None, skipna=True, level=None, **kwargs)

Return whether any elements is True in DataFrame.

Parameters
skipna: bool, default True

Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be False, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.

Returns
Series

Notes

Parameters currently not supported are axis, bool_only, level.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [3, 2, 3, 4], 'b': [7, 0, 10, 10]})
>>> df.any()
a    True
b    True
dtype: bool
append(other, ignore_index=False, verify_integrity=False, sort=False)

Append rows of other to the end of caller, returning a new object. Columns in other that are not in the caller are added as new columns.

Parameters
otherDataFrame or Series/dict-like object, or list of these

The data to append.

ignore_indexbool, default False

If True, do not use the index labels.

sortbool, default False

Sort columns ordering if the columns of self and other are not aligned.

verify_integritybool, default False

This Parameter is currently not supported.

Returns
DataFrame

See also

cudf.core.reshape.concat

General function to concatenate DataFrame or objects.

Notes

If a list of dict/series is passed and the keys are all contained in the DataFrame’s index, the order of the columns in the resulting DataFrame will be unchanged. Iteratively appending rows to a cudf DataFrame can be more computationally intensive than a single concatenate. A better solution is to append those rows to a list and then concatenate the list with the original DataFrame all at once. verify_integrity parameter is not supported yet.

Examples

>>> import cudf
>>> df = cudf.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
>>> df
   A  B
0  1  2
1  3  4
>>> df2 = cudf.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
>>> df2
   A  B
0  5  6
1  7  8
>>> df.append(df2)
   A  B
0  1  2
1  3  4
0  5  6
1  7  8

With ignore_index set to True:

>>> df.append(df2, ignore_index=True)
   A  B
0  1  2
1  3  4
2  5  6
3  7  8

The following, while not recommended methods for generating DataFrames, show two ways to generate a DataFrame from multiple data sources. Less efficient:

>>> df = cudf.DataFrame(columns=['A'])
>>> for i in range(5):
...     df = df.append({'A': i}, ignore_index=True)
>>> df
   A
0  0
1  1
2  2
3  3
4  4

More efficient than above:

>>> cudf.concat([cudf.DataFrame([i], columns=['A']) for i in range(5)],
...           ignore_index=True)
   A
0  0
1  1
2  2
3  3
4  4
apply_chunks(func, incols, outcols, kwargs=None, pessimistic_nulls=True, chunks=None, blkct=None, tpb=None)

Transform user-specified chunks using the user-provided function.

Parameters
dfDataFrame

The source dataframe.

funcfunction

The transformation function that will be executed on the CUDA GPU.

incols: list or dict

A list of names of input columns that match the function arguments. Or, a dictionary mapping input column names to their corresponding function arguments such as {‘col1’: ‘arg1’}.

outcols: dict

A dictionary of output column names and their dtype.

kwargs: dict

name-value of extra arguments. These values are passed directly into the function.

pessimistic_nullsbool

Whether or not apply_rows output should be null when any corresponding input is null. If False, all outputs will be non-null, but will be the result of applying func against the underlying column data, which may be garbage.

chunksint or Series-like

If it is an int, it is the chunksize. If it is an array, it contains integer offset for the start of each chunk. The span of a chunk for chunk i-th is data[chunks[i] : chunks[i + 1]] for any i + 1 < chunks.size; or, data[chunks[i]:] for the i == len(chunks) - 1.

tpbint; optional

The threads-per-block for the underlying kernel. If not specified (Default), uses Numba .forall(...) built-in to query the CUDA Driver API to determine optimal kernel launch configuration. Specify 1 to emulate serial execution for each chunk. It is a good starting point but inefficient. Its maximum possible value is limited by the available CUDA GPU resources.

blkctint; optional

The number of blocks for the underlying kernel. If not specified (Default) and tpb is not specified (Default), uses Numba .forall(...) built-in to query the CUDA Driver API to determine optimal kernel launch configuration. If not specified (Default) and tpb is specified, uses chunks as the number of blocks.

Examples

For tpb > 1, func is executed by tpb number of threads concurrently. To access the thread id and count, use numba.cuda.threadIdx.x and numba.cuda.blockDim.x, respectively (See numba CUDA kernel documentation).

In the example below, the kernel is invoked concurrently on each specified chunk. The kernel computes the corresponding output for the chunk.

By looping over the range range(cuda.threadIdx.x, in1.size, cuda.blockDim.x), the kernel function can be used with any tpb in an efficient manner.

>>> from numba import cuda
>>> @cuda.jit
... def kernel(in1, in2, in3, out1):
...      for i in range(cuda.threadIdx.x, in1.size, cuda.blockDim.x):
...          x = in1[i]
...          y = in2[i]
...          z = in3[i]
...          out1[i] = x * y + z
apply_rows(func, incols, outcols, kwargs, pessimistic_nulls=True, cache_key=None)

Apply a row-wise user defined function.

Parameters
dfDataFrame

The source dataframe.

funcfunction

The transformation function that will be executed on the CUDA GPU.

incols: list or dict

A list of names of input columns that match the function arguments. Or, a dictionary mapping input column names to their corresponding function arguments such as {‘col1’: ‘arg1’}.

outcols: dict

A dictionary of output column names and their dtype.

kwargs: dict

name-value of extra arguments. These values are passed directly into the function.

pessimistic_nullsbool

Whether or not apply_rows output should be null when any corresponding input is null. If False, all outputs will be non-null, but will be the result of applying func against the underlying column data, which may be garbage.

Examples

The user function should loop over the columns and set the output for each row. Loop execution order is arbitrary, so each iteration of the loop MUST be independent of each other.

When func is invoked, the array args corresponding to the input/output are strided so as to improve GPU parallelism. The loop in the function resembles serial code, but executes concurrently in multiple threads.

>>> import cudf
>>> import numpy as np
>>> df = cudf.DataFrame()
>>> nelem = 3
>>> df['in1'] = np.arange(nelem)
>>> df['in2'] = np.arange(nelem)
>>> df['in3'] = np.arange(nelem)

Define input columns for the kernel

>>> in1 = df['in1']
>>> in2 = df['in2']
>>> in3 = df['in3']
>>> def kernel(in1, in2, in3, out1, out2, kwarg1, kwarg2):
...     for i, (x, y, z) in enumerate(zip(in1, in2, in3)):
...         out1[i] = kwarg2 * x - kwarg1 * y
...         out2[i] = y - kwarg1 * z

Call .apply_rows with the name of the input columns, the name and dtype of the output columns, and, optionally, a dict of extra arguments.

>>> df.apply_rows(kernel,
...               incols=['in1', 'in2', 'in3'],
...               outcols=dict(out1=np.float64, out2=np.float64),
...               kwargs=dict(kwarg1=3, kwarg2=4))
   in1  in2  in3 out1 out2
0    0    0    0  0.0  0.0
1    1    1    1  1.0 -2.0
2    2    2    2  2.0 -4.0
argsort(ascending=True, na_position='last')

Sort by the values.

Parameters
ascendingbool or list of bool, default True

If True, sort values in ascending order, otherwise descending.

na_position{‘first’ or ‘last’}, default ‘last’

Argument ‘first’ puts NaNs at the beginning, ‘last’ puts NaNs at the end.

Returns
out_column_indscuDF Column of indices sorted based on input

Notes

Difference from pandas:

  • Support axis=’index’ only.

  • Not supporting: inplace, kind

  • Ascending can be a list of bools to control per column

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a':[10, 0, 2], 'b':[-10, 10, 1]})
>>> df
    a   b
0  10 -10
1   0  10
2   2   1
>>> inds = df.argsort()
>>> inds
0    1
1    2
2    0
dtype: int32
>>> df.take(inds)
    a   b
1   0  10
2   2   1
0  10 -10
as_gpu_matrix(columns=None, order='F')

Convert to a matrix in device memory.

Parameters
columnssequence of str

List of a column names to be extracted. The order is preserved. If None is specified, all columns are used.

order‘F’ or ‘C’

Optional argument to determine whether to return a column major (Fortran) matrix or a row major (C) matrix.

Returns
A (nrow x ncol) numba device ndarray
as_matrix(columns=None)

Convert to a matrix in host memory.

Parameters
columnssequence of str

List of a column names to be extracted. The order is preserved. If None is specified, all columns are used.

Returns
A (nrow x ncol) numpy ndarray in “F” order.
asin()

Get Trigonometric inverse sine, element-wise.

The inverse of sine so that, if y = x.sin(), then x = y.asin()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.asin()
0   -1.570796
1    0.000000
2    1.570796
3    0.330314
4    0.523599
dtype: float64

asin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.asin()
      first    second
0 -1.570796  0.236190
1  0.000000  0.304693
2  0.523599  0.100167

asin operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64')
>>> index.asin()
Float64Index([-1.5707963267948966, 0.41151684606748806,
            1.5707963267948966, 0.3046926540153975],
            dtype='float64')
assign(**kwargs)

Assign columns to DataFrame from keyword arguments.

Examples

>>> import cudf
>>> df = cudf.DataFrame()
>>> df = df.assign(a=[0, 1, 2], b=[3, 4, 5])
>>> df
   a  b
0  0  3
1  1  4
2  2  5
astype(dtype, copy=False, errors='raise', **kwargs)

Cast the DataFrame to the given dtype

Parameters
dtypedata type, or dict of column name -> data type

Use a numpy.dtype or Python type to cast entire DataFrame object to the same type. Alternatively, use {col: dtype, ...}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame’s columns to column-specific types.

copybool, default False

Return a deep-copy when copy=True. Note by default copy=False setting is used and hence changes to values then may propagate to other cudf objects.

errors{‘raise’, ‘ignore’, ‘warn’}, default ‘raise’

Control raising of exceptions on invalid data for provided dtype.

  • raise : allow exceptions to be raised

  • ignore : suppress exceptions. On error return original object.

  • warn : prints last exceptions as warnings and return original object.

**kwargsextra arguments to pass on to the constructor
Returns
castedDataFrame

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [10, 20, 30], 'b': [1, 2, 3]})
>>> df
    a  b
0  10  1
1  20  2
2  30  3
>>> df.dtypes
a    int64
b    int64
dtype: object

Cast all columns to int32:

>>> df.astype('int32').dtypes
a    int32
b    int32
dtype: object

Cast a to float32 using a dictionary:

>>> df.astype({'a': 'float32'}).dtypes
a    float32
b      int64
dtype: object
>>> df.astype({'a': 'float32'})
    a  b
0  10.0  1
1  20.0  2
2  30.0  3
property at

Alias for DataFrame.loc; provided for compatibility with Pandas.

atan()

Get Trigonometric inverse tangent, element-wise.

The inverse of tan so that, if y = x.tan(), then x = y.atan()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10])
>>> ser
0    -1.00000
1     0.00000
2     1.00000
3     0.32434
4     0.50000
5   -10.00000
dtype: float64
>>> ser.atan()
0   -0.785398
1    0.000000
2    0.785398
3    0.313635
4    0.463648
5   -1.471128
dtype: float64

atan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.atan()
      first    second
0 -0.785398  0.229864
1 -1.471128  0.291457
2  0.463648  1.471128

atan operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.atan()
Float64Index([-0.7853981633974483,  0.3805063771123649,
                            0.7853981633974483, 0.0,
                            0.2914567944778671],
            dtype='float64')
clip(lower=None, upper=None, inplace=False, axis=1)

Trim values at input threshold(s).

Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.

Parameters
lowerscalar or array_like, default None

Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.

upperscalar or array_like, default None

Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.

inplacebool, default False
Returns
Clipped DataFrame/Series/Index/MultiIndex

Examples

>>> import cudf
>>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']})
>>> df.clip(lower=[2, 'b'], upper=[3, 'c'])
   a  b
0  2  b
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=None, upper=[3, 'c'])
   a  b
0  1  a
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=[2, 'b'], upper=None)
   a  b
0  2  b
1  2  b
2  3  c
3  4  d
>>> df.clip(lower=2, upper=3, inplace=True)
>>> df
   a  b
0  2  2
1  2  3
2  3  3
3  3  3
>>> import cudf
>>> sr = cudf.Series([1, 2, 3, 4])
>>> sr.clip(lower=2, upper=3)
0    2
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=None, upper=3)
0    1
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True)
>>> sr
0    2
1    2
2    3
3    4
dtype: int64
property columns

Returns a tuple of columns

copy(deep: bool = True)T

Make a copy of this object’s indices and data.

When deep=True (default), a new object will be created with a copy of the calling object’s data and indices. Modifications to the data or indices of the copy will not be reflected in the original object (see notes below). When deep=False, a new object will be created without copying the calling object’s data or index (only references to the data and index are copied). Any changes to the data of the original will be reflected in the shallow copy (and vice versa).

Parameters
deepbool, default True

Make a deep copy, including a copy of the data and the indices. With deep=False neither the indices nor the data are copied.

Returns
copySeries or DataFrame

Object type matches caller.

Examples

>>> s = cudf.Series([1, 2], index=["a", "b"])
>>> s
a    1
b    2
dtype: int64
>>> s_copy = s.copy()
>>> s_copy
a    1
b    2
dtype: int64

Shallow copy versus default (deep) copy:

>>> s = cudf.Series([1, 2], index=["a", "b"])
>>> deep = s.copy()
>>> shallow = s.copy(deep=False)

Shallow copy shares data and index with original.

>>> s is shallow
False
>>> s._column is shallow._column and s.index is shallow.index
True

Deep copy has own copy of data and index.

>>> s is deep
False
>>> s.values is deep.values or s.index is deep.index
False

Updates to the data shared by shallow copy and original is reflected in both; deep copy remains unchanged.

>>> s['a'] = 3
>>> shallow['b'] = 4
>>> s
a    3
b    4
dtype: int64
>>> shallow
a    3
b    4
dtype: int64
>>> deep
a    1
b    2
dtype: int64
corr()

Compute the correlation matrix of a DataFrame.

cos()

Get Trigonometric cosine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.cos()
0    1.000000
1    0.947861
2    0.877583
3    0.525322
4   -0.448074
5   -0.598460
6   -0.283691
dtype: float64

cos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.cos()
      first    second
0  1.000000  0.862319
1  0.283662 -0.283691
2 -0.839072 -0.839039
3 -0.759688 -0.022097

cos operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.cos()
Float64Index([ 0.9210609940028851,  0.8623188722876839,
            -0.5984600690578581, -0.4480736161291701],
            dtype='float64')
count(axis=0, level=None, numeric_only=False, **kwargs)

Count non-NA cells for each column or row.

The values None, NaN, NaT are considered NA.

Returns
Series

For each column/row the number of non-NA/null entries.

Notes

Parameters currently not supported are axis, level, numeric_only.

Examples

>>> import cudf
>>> import numpy as np
>>> df = cudf.DataFrame({"Person":
...        ["John", "Myla", "Lewis", "John", "Myla"],
...        "Age": [24., np.nan, 21., 33, 26],
...        "Single": [False, True, True, True, False]})
>>> df.count()
Person    5
Age       4
Single    5
dtype: int64
cov(**kwargs)

Compute the covariance matrix of a DataFrame.

Parameters
**kwargs

Keyword arguments to be passed to cupy.cov

Returns
covDataFrame
cummax(axis=None, skipna=True, *args, **kwargs)

Return cumulative maximum of the DataFrame.

Parameters
skipna: bool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

Returns
DataFrame

Notes

Parameters currently not supported is axis

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]})
>>> df.cummax()
   a   b
0  1   7
1  2   8
2  3   9
3  4  10
cummin(axis=None, skipna=True, *args, **kwargs)

Return cumulative minimum of the DataFrame.

Parameters
skipna: bool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

Returns
DataFrame

Notes

Parameters currently not supported is axis

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]})
>>> df.cummin()
   a  b
0  1  7
1  1  7
2  1  7
3  1  7
cumprod(axis=None, skipna=True, *args, **kwargs)

Return cumulative product of the DataFrame.

Parameters
skipna: bool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

Returns
DataFrame

Notes

Parameters currently not supported is axis

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]})
>>> s.cumprod()
    a     b
0   1     7
1   2    56
2   6   504
3  24  5040
cumsum(axis=None, skipna=True, *args, **kwargs)

Return cumulative sum of the DataFrame.

Parameters
skipna: bool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

Returns
DataFrame

Notes

Parameters currently not supported is axis

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]})
>>> s.cumsum()
    a   b
0   1   7
1   3  15
2   6  24
3  10  34
describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)

Generate descriptive statistics.

Descriptive statistics include those that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.

Analyzes both numeric and object series, as well as DataFrame column sets of mixed data types. The output will vary depending on what is provided. Refer to the notes below for more detail.

Parameters
percentileslist-like of numbers, optional

The percentiles to include in the output. All should fall between 0 and 1. The default is [.25, .5, .75], which returns the 25th, 50th, and 75th percentiles.

include‘all’, list-like of dtypes or None(default), optional

A list of data types to include in the result. Ignored for Series. Here are the options:

  • ‘all’ : All columns of the input will be included in the output.

  • A list-like of dtypes : Limits the results to the provided data types. To limit the result to numeric types submit numpy.number. To limit it instead to object columns submit the numpy.object data type. Strings can also be used in the style of select_dtypes (e.g. df.describe(include=['O'])). To select pandas categorical columns, use 'category'

  • None (default) : The result will include all numeric columns.

excludelist-like of dtypes or None (default), optional,

A list of data types to omit from the result. Ignored for Series. Here are the options:

  • A list-like of dtypes : Excludes the provided data types from the result. To exclude numeric types submit numpy.number. To exclude object columns submit the data type numpy.object. Strings can also be used in the style of select_dtypes (e.g. df.describe(include=['O'])). To exclude pandas categorical columns, use 'category'

  • None (default) : The result will exclude nothing.

datetime_is_numericbool, default False

For DataFrame input, this also controls whether datetime columns are included by default.

Returns
output_frameSeries or DataFrame

Summary statistics of the Series or Dataframe provided.

Notes

For numeric data, the result’s index will include count, mean, std, min, max as well as lower, 50 and upper percentiles. By default the lower percentile is 25 and the upper percentile is 75. The 50 percentile is the same as the median.

For strings dtype or datetime dtype, the result’s index will include count, unique, top, and freq. The top is the most common value. The freq is the most common value’s frequency. Timestamps also include the first and last items.

If multiple object values have the highest count, then the count and top results will be arbitrarily chosen from among those with the highest count.

For mixed data types provided via a DataFrame, the default is to return only an analysis of numeric columns. If the dataframe consists only of object and categorical data without any numeric columns, the default is to return an analysis of both the object and categorical columns. If include='all' is provided as an option, the result will include a union of attributes of each type.

The include and exclude parameters can be used to limit which columns in a DataFrame are analyzed for the output. The parameters are ignored when analyzing a Series.

Examples

Describing a Series containing numeric values.

>>> import cudf
>>> s = cudf.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> s
0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9    10
dtype: int64
>>> s.describe()
count    10.00000
mean      5.50000
std       3.02765
min       1.00000
25%       3.25000
50%       5.50000
75%       7.75000
max      10.00000
dtype: float64

Describing a categorical Series.

>>> s = cudf.Series(['a', 'b', 'a', 'b', 'c', 'a'], dtype='category')
>>> s
0    a
1    b
2    a
3    b
4    c
5    a
dtype: category
Categories (3, object): ['a', 'b', 'c']
>>> s.describe()
count     6
unique    3
top       a
freq      3
dtype: object

Describing a timestamp Series.

>>> import numpy as np
>>> s = cudf.Series([
...   np.datetime64("2000-01-01"),
...   np.datetime64("2010-01-01"),
...   np.datetime64("2010-01-01")
... ])
>>> s
0   2000-01-01
1   2010-01-01
2   2010-01-01
dtype: datetime64[s]
>>> s.describe()
count                                3
mean     2006-09-01 08:00:00.000000000
min      2000-01-01 00:00:00.000000000
25%      2004-12-31 12:00:00.000000000
50%      2010-01-01 00:00:00.000000000
75%      2010-01-01 00:00:00.000000000
max      2010-01-01 00:00:00.000000000
dtype: object

Describing a DataFrame. By default only numeric fields are returned.

>>> df = cudf.DataFrame({"categorical": cudf.Series(['d', 'e', 'f'],
...                         dtype='category'),
...                      "numeric": [1, 2, 3],
...                      "object": ['a', 'b', 'c']
... })
>>> df
  categorical  numeric object
0           d        1      a
1           e        2      b
2           f        3      c
>>> df.describe()
       numeric
count      3.0
mean       2.0
std        1.0
min        1.0
25%        1.5
50%        2.0
75%        2.5
max        3.0

Describing all columns of a DataFrame regardless of data type.

>>> df.describe(include='all')
       categorical numeric object
count            3     3.0      3
unique           3    <NA>      3
top              d    <NA>      a
freq             1    <NA>      1
mean          <NA>     2.0   <NA>
std           <NA>     1.0   <NA>
min           <NA>     1.0   <NA>
25%           <NA>     1.5   <NA>
50%           <NA>     2.0   <NA>
75%           <NA>     2.5   <NA>
max           <NA>     3.0   <NA>

Describing a column from a DataFrame by accessing it as an attribute.

>>> df.numeric.describe()
count    3.0
mean     2.0
std      1.0
min      1.0
25%      1.5
50%      2.0
75%      2.5
max      3.0
Name: numeric, dtype: float64

Including only numeric columns in a DataFrame description.

>>> df.describe(include=[np.number])
       numeric
count      3.0
mean       2.0
std        1.0
min        1.0
25%        1.5
50%        2.0
75%        2.5
max        3.0

Including only string columns in a DataFrame description.

>>> df.describe(include=[object])
       object
count       3
unique      3
top         a
freq        1

Including only categorical columns from a DataFrame description.

>>> df.describe(include=['category'])
       categorical
count            3
unique           3
top              d
freq             1

Excluding numeric columns from a DataFrame description.

>>> df.describe(exclude=[np.number])
       categorical object
count            3      3
unique           3      3
top              d      a
freq             1      1

Excluding object columns from a DataFrame description.

>>> df.describe(exclude=[object])
       categorical numeric
count            3     3.0
unique           3    <NA>
top              d    <NA>
freq             1    <NA>
mean          <NA>     2.0
std           <NA>     1.0
min           <NA>     1.0
25%           <NA>     1.5
50%           <NA>     2.0
75%           <NA>     2.5
max           <NA>     3.0
div(other, axis='columns', level=None, fill_value=None)

Get Floating division of dataframe and other, element-wise (binary operator truediv).

Equivalent to dataframe / other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
DataFrame

Result of the arithmetic operation.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df.truediv(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df / 10
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

Drop specified labels from rows or columns.

Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names. When using a multi-index, labels on different levels can be removed by specifying the level.

Parameters
labelssingle label or list-like

Index or column labels to drop.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Whether to drop labels from the index (0 or ‘index’) or columns (1 or ‘columns’).

indexsingle label or list-like

Alternative to specifying axis (labels, axis=0 is equivalent to index=labels).

columnssingle label or list-like

Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels).

levelint or level name, optional

For MultiIndex, level from which the labels will be removed.

inplacebool, default False

If False, return a copy. Otherwise, do operation inplace and return None.

errors{‘ignore’, ‘raise’}, default ‘raise’

If ‘ignore’, suppress error and only existing labels are dropped.

Returns
DataFrame

DataFrame without the removed index or column labels.

Raises
KeyError

If any of the labels is not found in the selected axis.

See also

DataFrame.loc

Label-location based indexer for selection by label.

DataFrame.dropna

Return DataFrame with labels on given axis omitted where (all or any) data are missing.

DataFrame.drop_duplicates

Return DataFrame with duplicate rows removed, optionally only considering certain columns.

Examples

>>> import cudf
>>> df = cudf.DataFrame({"A": [1, 2, 3, 4],
...                      "B": [5, 6, 7, 8],
...                      "C": [10, 11, 12, 13],
...                      "D": [20, 30, 40, 50]})
>>> df
   A  B   C   D
0  1  5  10  20
1  2  6  11  30
2  3  7  12  40
3  4  8  13  50

Drop columns

>>> df.drop(['B', 'C'], axis=1)
   A   D
0  1  20
1  2  30
2  3  40
3  4  50
>>> df.drop(columns=['B', 'C'])
   A   D
0  1  20
1  2  30
2  3  40
3  4  50

Drop a row by index

>>> df.drop([0, 1])
   A  B   C   D
2  3  7  12  40
3  4  8  13  50

Drop columns and/or rows of MultiIndex DataFrame

>>> midx = cudf.MultiIndex(levels=[['lama', 'cow', 'falcon'],
...                              ['speed', 'weight', 'length']],
...                      codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2],
...                             [0, 1, 2, 0, 1, 2, 0, 1, 2]])
>>> df = cudf.DataFrame(index=midx, columns=['big', 'small'],
...                   data=[[45, 30], [200, 100], [1.5, 1], [30, 20],
...                         [250, 150], [1.5, 0.8], [320, 250],
...                         [1, 0.8], [0.3, 0.2]])
>>> df
                 big  small
lama   speed    45.0   30.0
       weight  200.0  100.0
       length    1.5    1.0
cow    speed    30.0   20.0
       weight  250.0  150.0
       length    1.5    0.8
falcon speed   320.0  250.0
       weight    1.0    0.8
       length    0.3    0.2
>>> df.drop(index='cow', columns='small')
                 big
lama   speed    45.0
       weight  200.0
       length    1.5
falcon speed   320.0
       weight    1.0
       length    0.3
>>> df.drop(index='length', level=1)
                 big  small
lama   speed    45.0   30.0
       weight  200.0  100.0
cow    speed    30.0   20.0
       weight  250.0  150.0
falcon speed   320.0  250.0
       weight    1.0    0.8
drop_duplicates(subset=None, keep='first', inplace=False, ignore_index=False)

Return DataFrame with duplicate rows removed, optionally only considering certain subset of columns.

Parameters
subsetcolumn label or sequence of labels, optional

Only consider certain columns for identifying duplicates, by default use all of the columns.

keep{‘first’, ‘last’, False}, default ‘first’

Determines which duplicates (if any) to keep. - first : Drop duplicates except for the first occurrence. - last : Drop duplicates except for the last occurrence. - False : Drop all duplicates.

inplacebool, default False

Whether to drop duplicates in place or to return a copy.

ignore_indexbool, default False

If True, the resulting axis will be labeled 0, 1, …, n - 1.

Returns
DataFrame or None

DataFrame with duplicates removed or None if inplace=True.

Examples

>>> import cudf
>>> df = cudf.DataFrame({
...     'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
...     'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
...     'rating': [4, 4, 3.5, 15, 5]
... })
>>> df
     brand style  rating
0  Yum Yum   cup     4.0
1  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0

By default, it removes duplicate rows based on all columns. Note that order of the rows being returned is not guaranteed to be sorted.

>>> df.drop_duplicates()
     brand style  rating
2  Indomie   cup     3.5
4  Indomie  pack     5.0
3  Indomie  pack    15.0
0  Yum Yum   cup     4.0

To remove duplicates on specific column(s), use subset.

>>> df.drop_duplicates(subset=['brand'])
     brand style  rating
2  Indomie   cup     3.5
0  Yum Yum   cup     4.0

To remove duplicates and keep last occurrences, use keep.

>>> df.drop_duplicates(subset=['brand', 'style'], keep='last')
     brand style  rating
2  Indomie   cup     3.5
4  Indomie  pack     5.0
1  Yum Yum   cup     4.0
dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

Drops rows (or columns) containing nulls from a Column.

Parameters
axis{0, 1}, optional

Whether to drop rows (axis=0, default) or columns (axis=1) containing nulls.

how{“any”, “all”}, optional

Specifies how to decide whether to drop a row (or column). any (default) drops rows (or columns) containing at least one null value. all drops only rows (or columns) containing all null values.

thresh: int, optional

If specified, then drops every row (or column) containing less than thresh non-null values

subsetlist, optional

List of columns to consider when dropping rows (all columns are considered by default). Alternatively, when dropping columns, subset is a list of rows to consider.

inplacebool, default False

If True, do operation inplace and return None.

Returns
Copy of the DataFrame with rows/columns containing nulls dropped.

See also

cudf.core.dataframe.DataFrame.isna

Indicate null values.

cudf.core.dataframe.DataFrame.notna

Indicate non-null values.

cudf.core.dataframe.DataFrame.fillna

Replace null values.

cudf.core.series.Series.dropna

Drop null values.

cudf.core.index.Index.dropna

Drop null indices.

Examples

>>> import cudf
>>> df = cudf.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
...                    "toy": ['Batmobile', None, 'Bullwhip'],
...                    "born": [np.datetime64("1940-04-25"),
...                             np.datetime64("NaT"),
...                             np.datetime64("NaT")]})
>>> df
       name        toy                 born
0    Alfred  Batmobile  1940-04-25 00:00:00
1    Batman       <NA>                 <NA>
2  Catwoman   Bullwhip                 <NA>

Drop the rows where at least one element is null.

>>> df.dropna()
     name        toy       born
0  Alfred  Batmobile 1940-04-25

Drop the columns where at least one element is null.

>>> df.dropna(axis='columns')
       name
0    Alfred
1    Batman
2  Catwoman

Drop the rows where all elements are null.

>>> df.dropna(how='all')
       name        toy                 born
0    Alfred  Batmobile  1940-04-25 00:00:00
1    Batman       <NA>                 <NA>
2  Catwoman   Bullwhip                 <NA>

Keep only the rows with at least 2 non-null values.

>>> df.dropna(thresh=2)
       name        toy                 born
0    Alfred  Batmobile  1940-04-25 00:00:00
2  Catwoman   Bullwhip                 <NA>

Define in which columns to look for null values.

>>> df.dropna(subset=['name', 'born'])
     name        toy       born
0  Alfred  Batmobile 1940-04-25

Keep the DataFrame with valid entries in the same variable.

>>> df.dropna(inplace=True)
>>> df
     name        toy       born
0  Alfred  Batmobile 1940-04-25
property dtypes

Return the dtypes in this object.

Returns
pandas.Series

The data type of each column.

Examples

>>> import cudf
>>> import pandas as pd
>>> df = cudf.DataFrame({'float': [1.0],
...                    'int': [1],
...                    'datetime': [pd.Timestamp('20180310')],
...                    'string': ['foo']})
>>> df
   float  int   datetime string
0    1.0    1 2018-03-10    foo
>>> df.dtypes
float              float64
int                  int64
datetime    datetime64[us]
string              object
dtype: object
property empty

Indicator whether DataFrame or Series is empty.

True if DataFrame/Series is entirely empty (no items), meaning any of the axes are of length 0.

Returns
outbool

If DataFrame/Series is empty, return True, if not return False.

Notes

If DataFrame/Series contains only null values, it is still not considered empty. See the example below.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'A' : []})
>>> df
Empty DataFrame
Columns: [A]
Index: []
>>> df.empty
True

If we only have null values in our DataFrame, it is not considered empty! We will need to drop the null’s to make the DataFrame empty:

>>> df = cudf.DataFrame({'A' : [None, None]})
>>> df
      A
0  <NA>
1  <NA>
>>> df.empty
False
>>> df.dropna().empty
True

Non-empty and empty Series example:

>>> s = cudf.Series([1, 2, None])
>>> s
0       1
1       2
2    <NA>
dtype: int64
>>> s.empty
False
>>> s = cudf.Series([])
>>> s
Series([], dtype: float64)
>>> s.empty
True
equals(other)

Test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal. The column headers do not need to have the same type.

Parameters
otherSeries or DataFrame

The other Series or DataFrame to be compared with the first.

Returns
bool

True if all elements are the same in both objects, False otherwise.

Examples

>>> import cudf

Comparing Series with equals:

>>> s = cudf.Series([1, 2, 3])
>>> other = cudf.Series([1, 2, 3])
>>> s.equals(other)
True
>>> different = cudf.Series([1.5, 2, 3])
>>> s.equals(different)
False

Comparing DataFrames with equals:

>>> df = cudf.DataFrame({1: [10], 2: [20]})
>>> df
    1   2
0  10  20
>>> exactly_equal = cudf.DataFrame({1: [10], 2: [20]})
>>> exactly_equal
    1   2
0  10  20
>>> df.equals(exactly_equal)
True

For two DataFrames to compare equal, the types of column values must be equal, but the types of column labels need not:

>>> different_column_type = cudf.DataFrame({1.0: [10], 2.0: [20]})
>>> different_column_type
   1.0  2.0
0   10   20
>>> df.equals(different_column_type)
True
exp()

Get the exponential of all elements, element-wise.

Exponential is the inverse of the log function, so that x.exp().log() = x

Returns
DataFrame/Series/Index

Result of the element-wise exponential.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.exp()
0    3.678794e-01
1    1.000000e+00
2    2.718282e+00
3    1.383117e+00
4    1.648721e+00
5    4.539993e-05
6    2.688117e+43
dtype: float64

exp operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.exp()
      first        second
0  0.367879      1.263644
1  0.000045      1.349859
2  1.648721  22026.465795

exp operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.exp()
Float64Index([0.36787944117144233,  1.4918246976412703,
              2.718281828459045, 1.0,  1.3498588075760032],
            dtype='float64')
explode(column, ignore_index=False)

Transform each element of a list-like to a row, replicating index values.

Parameters
columnstr or tuple

Column to explode.

ignore_indexbool, default False

If True, the resulting index will be labeled 0, 1, …, n - 1.

Returns
DataFrame

Examples

>>> import cudf
>>> cudf.DataFrame(
        {"a": [[1, 2, 3], [], None, [4, 5]], "b": [11, 22, 33, 44]})
           a   b
0  [1, 2, 3]  11
1         []  22
2       None  33
3     [4, 5]  44
>>> df.explode('a')
      a   b
0     1  11
0     2  11
0     3  11
1  <NA>  22
2  <NA>  33
3     4  44
3     5  44
fillna(value=None, method=None, axis=None, inplace=False, limit=None)

Fill null values with value or specified method.

Parameters
valuescalar, Series-like or dict

Value to use to fill nulls. If Series-like, null values are filled with values in corresponding indices. A dict can be used to provide different values to fill nulls in different columns. Cannot be used with method.

method{‘ffill’, ‘bfill’}, default None

Method to use for filling null values in the dataframe or series. ffill propagates the last non-null values forward to the next non-null value. bfill propagates backward with the next non-null value. Cannot be used with value.

Returns
resultDataFrame

Copy with nulls filled.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, None], 'b': [3, None, 5]})
>>> df
      a     b
0     1     3
1     2  <NA>
2  <NA>     5
>>> df.fillna(4)
   a  b
0  1  3
1  2  4
2  4  5
>>> df.fillna({'a': 3, 'b': 4})
   a  b
0  1  3
1  2  4
2  3  5

fillna on a Series object:

>>> ser = cudf.Series(['a', 'b', None, 'c'])
>>> ser
0       a
1       b
2    <NA>
3       c
dtype: object
>>> ser.fillna('z')
0    a
1    b
2    z
3    c
dtype: object

fillna can also supports inplace operation:

>>> ser.fillna('z', inplace=True)
>>> ser
0    a
1    b
2    z
3    c
dtype: object
>>> df.fillna({'a': 3, 'b': 4}, inplace=True)
>>> df
   a  b
0  1  3
1  2  4
2  3  5

fillna specified with fill method

>>> ser = cudf.Series([1, None, None, 2, 3, None, None])
>>> ser.fillna(method='ffill')
0    1
1    1
2    1
3    2
4    3
5    3
6    3
dtype: int64
>>> ser.fillna(method='bfill')
0       1
1       2
2       2
3       2
4       3
5    <NA>
6    <NA>
dtype: int64
floordiv(other, axis='columns', level=None, fill_value=None)

Get Integer division of dataframe and other, element-wise (binary operator floordiv).

Equivalent to dataframe // other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rfloordiv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
DataFrame

Result of the arithmetic operation.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'angles': [1, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df.floordiv(2)
           angles  degrees
circle          0      180
triangle        1       90
rectangle       2      180
>>> df // 2
           angles  degrees
circle          0      180
triangle        1       90
rectangle       2      180
classmethod from_arrow(table)

Convert from PyArrow Table to DataFrame.

Parameters
tablePyArrow Table Object

PyArrow Table Object which has to be converted to cudf DataFrame.

Returns
cudf DataFrame
Raises
TypeError for invalid input type.

Notes

  • Does not support automatically setting index column(s) similar to how to_pandas works for PyArrow Tables.

Examples

>>> import cudf
>>> import pyarrow as pa
>>> data = pa.table({"a":[1, 2, 3], "b":[4, 5, 6]})
>>> cudf.DataFrame.from_arrow(data)
   a  b
0  1  4
1  2  5
2  3  6
classmethod from_pandas(dataframe, nan_as_null=None)

Convert from a Pandas DataFrame.

Parameters
dataframePandas DataFrame object

A Pandads DataFrame object which has to be converted to cuDF DataFrame.

nan_as_nullbool, Default True

If True, converts np.nan values to null values. If False, leaves np.nan values as is.

Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pandas as pd
>>> data = [[0,1], [1,2], [3,4]]
>>> pdf = pd.DataFrame(data, columns=['a', 'b'], dtype=int)
>>> cudf.from_pandas(pdf)
   a  b
0  0  1
1  1  2
2  3  4
classmethod from_records(data, index=None, columns=None, nan_as_null=False)

Convert structured or record ndarray to DataFrame.

Parameters
datanumpy structured dtype or recarray of ndim=2
indexstr, array-like

The name of the index column in data. If None, the default index is used.

columnslist of str

List of column names to include.

Returns
DataFrame
hash_columns(columns=None)

Hash the given columns and return a new device array

Parameters
columnssequence of str; optional

Sequence of column names. If columns is None (unspecified), all columns in the frame are used.

head(n=5)

Returns the first n rows as a new DataFrame

Examples

>>> import cudf
>>> df = cudf.DataFrame()
>>> df['key'] = [0, 1, 2, 3, 4]
>>> df['val'] = [float(i + 10) for i in range(5)]  # insert column
>>> df.head(2)
   key   val
0    0  10.0
1    1  11.0
property iat

Alias for DataFrame.iloc; provided for compatibility with Pandas.

property iloc

Selecting rows and column by position.

See also

DataFrame.loc

Notes

One notable difference from Pandas is when DataFrame is of mixed types and result is expected to be a Series in case of Pandas. cuDF will return a DataFrame as it doesn’t support mixed types under Series yet.

Mixed dtype single row output as a dataframe (pandas results in Series)

>>> import cudf
>>> df = cudf.DataFrame({"a":[1, 2, 3], "b":["a", "b", "c"]})
>>> df.iloc[0]
   a  b
0  1  a

Examples

>>> df = cudf.DataFrame({'a': range(20),
...                      'b': range(20),
...                      'c': range(20)})

Select a single row using an integer index.

>>> df.iloc[1]
a    1
b    1
c    1
Name: 1, dtype: int64

Select multiple rows using a list of integers.

>>> df.iloc[[0, 2, 9, 18]]
      a    b    c
 0    0    0    0
 2    2    2    2
 9    9    9    9
18   18   18   18

Select rows using a slice.

>>> df.iloc[3:10:2]
     a    b    c
3    3    3    3
5    5    5    5
7    7    7    7
9    9    9    9

Select both rows and columns.

>>> df.iloc[[1, 3, 5, 7], 2]
1    1
3    3
5    5
7    7
Name: c, dtype: int64

Setting values in a column using iloc.

>>> df.iloc[:4] = 0
>>> df
   a  b  c
0  0  0  0
1  0  0  0
2  0  0  0
3  0  0  0
4  4  4  4
5  5  5  5
6  6  6  6
7  7  7  7
8  8  8  8
9  9  9  9
[10 more rows]
property index

Returns the index of the DataFrame

info(verbose=None, buf=None, max_cols=None, memory_usage=None, null_counts=None)

Print a concise summary of a DataFrame.

This method prints information about a DataFrame including the index dtype and column dtypes, non-null values and memory usage.

Parameters
verbosebool, optional

Whether to print the full summary. By default, the setting in pandas.options.display.max_info_columns is followed.

bufwritable buffer, defaults to sys.stdout

Where to send the output. By default, the output is printed to sys.stdout. Pass a writable buffer if you need to further process the output.

max_colsint, optional

When to switch from the verbose to the truncated output. If the DataFrame has more than max_cols columns, the truncated output is used. By default, the setting in pandas.options.display.max_info_columns is used.

memory_usagebool, str, optional

Specifies whether total memory usage of the DataFrame elements (including the index) should be displayed. By default, this follows the pandas.options.display.memory_usage setting. True always show memory usage. False never shows memory usage. A value of ‘deep’ is equivalent to “True with deep introspection”. Memory usage is shown in human-readable units (base-2 representation). Without deep introspection a memory estimation is made based in column dtype and number of rows assuming values consume the same memory amount for corresponding dtypes. With deep memory introspection, a real memory usage calculation is performed at the cost of computational resources.

null_countsbool, optional

Whether to show the non-null counts. By default, this is shown only if the frame is smaller than pandas.options.display.max_info_rows and pandas.options.display.max_info_columns. A value of True always shows the counts, and False never shows the counts.

Returns
None

This method prints a summary of a DataFrame and returns None.

See also

DataFrame.describe

Generate descriptive statistics of DataFrame columns.

DataFrame.memory_usage

Memory usage of DataFrame columns.

Examples

>>> import cudf
>>> int_values = [1, 2, 3, 4, 5]
>>> text_values = ['alpha', 'beta', 'gamma', 'delta', 'epsilon']
>>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]
>>> df = cudf.DataFrame({"int_col": int_values,
...                     "text_col": text_values,
...                     "float_col": float_values})
>>> df
   int_col text_col  float_col
0        1    alpha       0.00
1        2     beta       0.25
2        3    gamma       0.50
3        4    delta       0.75
4        5  epsilon       1.00

Prints information of all columns:

>>> df.info(verbose=True)
<class 'cudf.core.dataframe.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   int_col    5 non-null      int64
 1   text_col   5 non-null      object
 2   float_col  5 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 130.0+ bytes

Prints a summary of columns count and its dtypes but not per column information:

>>> df.info(verbose=False)
<class 'cudf.core.dataframe.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Columns: 3 entries, int_col to float_col
dtypes: float64(1), int64(1), object(1)
memory usage: 130.0+ bytes

Pipe output of DataFrame.info to buffer instead of sys.stdout, get buffer content and writes to a text file:

>>> import io
>>> buffer = io.StringIO()
>>> df.info(buf=buffer)
>>> s = buffer.getvalue()
>>> with open("df_info.txt", "w",
...           encoding="utf-8") as f:
...     f.write(s)
...
369

The memory_usage parameter allows deep introspection mode, specially useful for big DataFrames and fine-tune memory optimization:

>>> import numpy as np
>>> random_strings_array = np.random.choice(['a', 'b', 'c'], 10 ** 6)
>>> df = cudf.DataFrame({
...     'column_1': np.random.choice(['a', 'b', 'c'], 10 ** 6),
...     'column_2': np.random.choice(['a', 'b', 'c'], 10 ** 6),
...     'column_3': np.random.choice(['a', 'b', 'c'], 10 ** 6)
... })
>>> df.info(memory_usage='deep')
<class 'cudf.core.dataframe.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 3 columns):
 #   Column    Non-Null Count    Dtype
---  ------    --------------    -----
 0   column_1  1000000 non-null  object
 1   column_2  1000000 non-null  object
 2   column_3  1000000 non-null  object
dtypes: object(3)
memory usage: 14.3 MB
insert(loc, name, value)

Add a column to DataFrame at the index specified by loc.

Parameters
locint

location to insert by index, cannot be greater then num columns + 1

namenumber or string

name or label of column to be inserted

valueSeries or array-like
interleave_columns()

Interleave Series columns of a table into a single column.

Converts the column major table cols into a row major column.

Parameters
colsinput Table containing columns to interleave.
Returns
The interleaved columns as a single column

Examples

>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']])
>>> df
0    [A1, A2, A3]
1    [B1, B2, B3]
>>> df.interleave_columns()
0    A1
1    B1
2    A2
3    B2
4    A3
5    B3
isin(values)

Whether each element in the DataFrame is contained in values.

Parameters
valuesiterable, Series, DataFrame or dict

The result will only be true at a location if all the labels match. If values is a Series, that’s the index. If values is a dict, the keys must be the column names, which must match. If values is a DataFrame, then both the index and column labels must match.

Returns
DataFrame:

DataFrame of booleans showing whether each element in the DataFrame is contained in values.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'num_legs': [2, 4], 'num_wings': [2, 0]},
...                     index=['falcon', 'dog'])
>>> df
        num_legs  num_wings
falcon         2          2
dog            4          0

When values is a list check whether every value in the DataFrame is present in the list (which animals have 0 or 2 legs or wings)

>>> df.isin([0, 2])
        num_legs  num_wings
falcon      True       True
dog        False       True

When values is a dict, we can pass values to check for each column separately:

>>> df.isin({'num_wings': [0, 3]})
        num_legs  num_wings
falcon     False      False
dog        False       True

When values is a Series or DataFrame the index and column must match. Note that ‘falcon’ does not match based on the number of legs in other.

>>> other = cudf.DataFrame({'num_legs': [8, 2], 'num_wings': [0, 2]},
...                         index=['spider', 'falcon'])
>>> df.isin(other)
        num_legs  num_wings
falcon      True       True
dog        False      False
isna()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
isnull()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
iteritems()

Iterate over column names and series pairs

join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False, method='hash')

Join columns with other DataFrame on index or on a key column.

Parameters
otherDataFrame
howstr

Only accepts “left”, “right”, “inner”, “outer”

lsuffix, rsuffixstr

The suffices to add to the left (lsuffix) and right (rsuffix) column names when avoiding conflicts.

sortbool

Set to True to ensure sorted ordering.

Returns
joinedDataFrame

Notes

Difference from pandas:

  • other must be a single DataFrame for now.

  • on is not supported yet due to lack of multi-index support.

keys()

Get the columns. This is index for Series, columns for DataFrame.

Returns
Index

Columns of DataFrame.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'one' : [1, 2, 3], 'five' : ['a', 'b', 'c']})
>>> df
   one five
0    1    a
1    2    b
2    3    c
>>> df.keys()
Index(['one', 'five'], dtype='object')
>>> df = cudf.DataFrame(columns=[0, 1, 2, 3])
>>> df
Empty DataFrame
Columns: [0, 1, 2, 3]
Index: []
>>> df.keys()
Int64Index([0, 1, 2, 3], dtype='int64')
kurt(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return Fisher’s unbiased kurtosis of a sample.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters
skipna: bool, default True

Exclude NA/null values when computing the result.

Returns
Series

Notes

Parameters currently not supported are axis, level and numeric_only

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]})
>>> df.kurt()
a   -1.2
b   -1.2
dtype: float64
kurtosis(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return Fisher’s unbiased kurtosis of a sample.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters
skipna: bool, default True

Exclude NA/null values when computing the result.

Returns
Series

Notes

Parameters currently not supported are axis, level and numeric_only

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]})
>>> df.kurt()
a   -1.2
b   -1.2
dtype: float64
label_encoding(column, prefix, cats, prefix_sep='_', dtype=None, na_sentinel=- 1)

Encode labels in a column with label encoding.

Parameters
columnstr

the source column with binary encoding for the data.

prefixstr

the new column name prefix.

catssequence of ints

the sequence of categories as integers.

prefix_sepstr

the separator between the prefix and the category.

dtype :

the dtype for the outputs; see Series.label_encoding

na_sentinelnumber

Value to indicate missing category.

Returns
a new dataframe with a new column append for the coded values.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a':[1, 2, 3], 'b':[10, 10, 20]})
>>> df
   a   b
0  1  10
1  2  10
2  3  20
>>> df.label_encoding(column="b", prefix="b_col", cats=[10, 20])
   a   b  b_col_labels
0  1  10             0
1  2  10             0
2  3  20             1
property loc

Selecting rows and columns by label or boolean mask.

See also

DataFrame.iloc

Notes

One notable difference from Pandas is when DataFrame is of mixed types and result is expected to be a Series in case of Pandas. cuDF will return a DataFrame as it doesn’t support mixed types under Series yet.

Mixed dtype single row output as a dataframe (pandas results in Series)

>>> import cudf
>>> df = cudf.DataFrame({"a":[1, 2, 3], "b":["a", "b", "c"]})
>>> df.loc[0]
   a  b
0  1  a

Examples

DataFrame with string index.

>>> df
   a  b
a  0  5
b  1  6
c  2  7
d  3  8
e  4  9

Select a single row by label.

>>> df.loc['a']
a    0
b    5
Name: a, dtype: int64

Select multiple rows and a single column.

>>> df.loc[['a', 'c', 'e'], 'b']
a    5
c    7
e    9
Name: b, dtype: int64

Selection by boolean mask.

>>> df.loc[df.a > 2]
   a  b
d  3  8
e  4  9

Setting values using loc.

>>> df.loc[['a', 'c', 'e'], 'a'] = 0
>>> df
   a  b
a  0  5
b  1  6
c  0  7
d  3  8
e  0  9
log()

Get the natural logarithm of all elements, element-wise.

Natural logarithm is the inverse of the exp function, so that x.log().exp() = x

Returns
DataFrame/Series/Index

Result of the element-wise natural logarithm.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.log()
0         NaN
1        -inf
2    0.000000
3   -1.125963
4   -0.693147
5         NaN
6    4.605170
dtype: float64

log operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.log()
      first    second
0       NaN -1.452434
1       NaN -1.203973
2 -0.693147  2.302585

log operation on Index:

>>> index = cudf.Index([10, 11, 500.0])
>>> index
Float64Index([10.0, 11.0, 500.0], dtype='float64')
>>> index.log()
Float64Index([2.302585092994046, 2.3978952727983707,
            6.214608098422191], dtype='float64')
mask(cond, other=None, inplace=False)

Replace values where the condition is True.

Parameters
condbool Series/DataFrame, array-like

Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.

other: scalar, list of scalars, Series/DataFrame

Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.

DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.

Series expects only scalar or series like with same length

inplacebool, default False

Whether to perform the operation in place on the data.

Returns
Same type as caller

Examples

>>> import cudf
>>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]})
>>> df.mask(df % 2 == 0, [-1, -1])
   A  B
0  1  3
1 -1  5
2  5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.mask(ser > 2, 10)
0    10
1    10
2     2
3     1
4     0
dtype: int64
>>> ser.mask(ser > 2)
0    <NA>
1    <NA>
2       2
3       1
4       0
dtype: int64
max(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the maximum of the values in the DataFrame.

Parameters
axis: {index (0), columns(1)}

Axis for the function to be applied on.

skipna: bool, default True

Exclude NA/null values when computing the result.

level: int or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

numeric_only: bool, default None

Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data.

Returns
Series

Notes

Parameters currently not supported are level, numeric_only.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]})
>>> df.max()
a     4
b    10
dtype: int64
mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the mean of the values for the requested axis.

Parameters
axis{0 or ‘index’, 1 or ‘columns’}

Axis for the function to be applied on.

skipnabool, default True

Exclude NA/null values when computing the result.

levelint or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

numeric_onlybool, default None

Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

**kwargs

Additional keyword arguments to be passed to the function.

Returns
meanSeries or DataFrame (if level specified)

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]})
>>> df.mean()
a    2.5
b    8.5
dtype: float64
melt(**kwargs)

Unpivots a DataFrame from wide format to long format, optionally leaving identifier variables set.

Parameters
frameDataFrame
id_varstuple, list, or ndarray, optional

Column(s) to use as identifier variables. default: None

value_varstuple, list, or ndarray, optional

Column(s) to unpivot. default: all columns that are not set as id_vars.

var_namescalar

Name to use for the variable column. default: frame.columns.name or ‘variable’

value_namestr

Name to use for the value column. default: ‘value’

Returns
outDataFrame

Melted result

memory_usage(index=True, deep=False)

Return the memory usage of each column in bytes. The memory usage can optionally include the contribution of the index and elements of object dtype.

Parameters
indexbool, default True

Specifies whether to include the memory usage of the DataFrame’s index in returned Series. If index=True, the memory usage of the index is the first item in the output.

deepbool, default False

If True, introspect the data deeply by interrogating object dtypes for system-level memory consumption, and include it in the returned values.

Returns
Series

A Series whose index is the original column names and whose values is the memory usage of each column in bytes.

Examples

>>> dtypes = ['int64', 'float64', 'object', 'bool']
>>> data = dict([(t, np.ones(shape=5000).astype(t))
...              for t in dtypes])
>>> df = cudf.DataFrame(data)
>>> df.head()
    int64  float64  object  bool
0      1      1.0     1.0  True
1      1      1.0     1.0  True
2      1      1.0     1.0  True
3      1      1.0     1.0  True
4      1      1.0     1.0  True
>>> df.memory_usage(index=False)
int64      40000
float64    40000
object     40000
bool        5000
dtype: int64
Use a Categorical for efficient storage of an object-dtype column with
many repeated values.
>>> df['object'].astype('category').memory_usage(deep=True)
5048
merge(right, on=None, left_on=None, right_on=None, left_index=False, right_index=False, how='inner', sort=False, lsuffix=None, rsuffix=None, method='hash', indicator=False, suffixes=('_x', '_y'))

Merge GPU DataFrame objects by performing a database-style join operation by columns or indexes.

Parameters
rightDataFrame
onlabel or list; defaults to None

Column or index level names to join on. These must be found in both DataFrames.

If on is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.

how{‘left’, ‘outer’, ‘inner’}, default ‘inner’

Type of merge to be performed.

  • left : use only keys from left frame, similar to a SQL left outer join.

  • right : not supported.

  • outer : use union of keys from both frames, similar to a SQL full outer join.

  • inner: use intersection of keys from both frames, similar to a SQL inner join.

left_onlabel or list, or array-like

Column or index level names to join on in the left DataFrame. Can also be an array or list of arrays of the length of the left DataFrame. These arrays are treated as if they are columns.

right_onlabel or list, or array-like

Column or index level names to join on in the right DataFrame. Can also be an array or list of arrays of the length of the right DataFrame. These arrays are treated as if they are columns.

left_indexbool, default False

Use the index from the left DataFrame as the join key(s).

right_indexbool, default False

Use the index from the right DataFrame as the join key.

sortbool, default False

Sort the resulting dataframe by the columns that were merged on, starting from the left.

suffixes: Tuple[str, str], defaults to (‘_x’, ‘_y’)

Suffixes applied to overlapping column names on the left and right sides

method{‘hash’, ‘sort’}, default ‘hash’

The implementation method to be used for the operation.

Returns
mergedDataFrame

Notes

DataFrames merges in cuDF result in non-deterministic row ordering.

Examples

>>> import cudf
>>> df_a = cudf.DataFrame()
>>> df_a['key'] = [0, 1, 2, 3, 4]
>>> df_a['vals_a'] = [float(i + 10) for i in range(5)]
>>> df_b = cudf.DataFrame()
>>> df_b['key'] = [1, 2, 4]
>>> df_b['vals_b'] = [float(i+10) for i in range(3)]
>>> df_merged = df_a.merge(df_b, on=['key'], how='left')
>>> df_merged.sort_values('key')  
   key  vals_a  vals_b
3    0    10.0
0    1    11.0    10.0
1    2    12.0    11.0
4    3    13.0
2    4    14.0    12.0

Merging on categorical variables is only allowed in certain cases

Categorical variable typecasting logic depends on both how and the specifics of the categorical variables to be merged. Merging categorical variables when only one side is ordered is ambiguous and not allowed. Merging when both categoricals are ordered is allowed, but only when the categories are exactly equal and have equal ordering, and will result in the common dtype. When both sides are unordered, the result categorical depends on the kind of join: - For inner joins, the result will be the intersection of the categories - For left or right joins, the result will be the the left or right dtype respectively. This extends to semi and anti joins. - For outer joins, the result will be the union of categories from both sides.

min(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the minimum of the values in the DataFrame.

Parameters
axis: {index (0), columns(1)}

Axis for the function to be applied on.

skipna: bool, default True

Exclude NA/null values when computing the result.

level: int or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

numeric_only: bool, default None

Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data.

Returns
Series

Notes

Parameters currently not supported are level, numeric_only.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]})
>>> df.min()
a    1
b    7
dtype: int64
mod(other, axis='columns', level=None, fill_value=None)

Get Modulo division of dataframe and other, element-wise (binary operator mod).

Equivalent to dataframe % other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmod.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
DataFrame

Result of the arithmetic operation.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df % 100
           angles  degrees
circle          0       60
triangle        3       80
rectangle       4       60
>>> df.mod(100)
           angles  degrees
circle          0       60
triangle        3       80
rectangle       4       60
mode(axis=0, numeric_only=False, dropna=True)

Get the mode(s) of each element along the selected axis.

The mode of a set of values is the value that appears most often. It can be multiple values.

Parameters
axis{0 or ‘index’, 1 or ‘columns’}, default 0

The axis to iterate over while searching for the mode:

  • 0 or ‘index’ : get mode of each column

  • 1 or ‘columns’ : get mode of each row.

numeric_onlybool, default False

If True, only apply to numeric columns.

dropnabool, default True

Don’t consider counts of NA/NaN/NaT.

Returns
DataFrame

The modes of each column or row.

See also

cudf.core.series.Series.mode

Return the highest frequency value in a Series.

cudf.core.series.Series.value_counts

Return the counts of values in a Series.

Notes

axis parameter is currently not supported.

Examples

>>> import cudf
>>> df = cudf.DataFrame({
...     "species": ["bird", "mammal", "arthropod", "bird"],
...     "legs": [2, 4, 8, 2],
...     "wings": [2.0, None, 0.0, None]
... })
>>> df
     species  legs wings
0       bird     2   2.0
1     mammal     4  <NA>
2  arthropod     8   0.0
3       bird     2  <NA>

By default, missing values are not considered, and the mode of wings are both 0 and 2. The second row of species and legs contains NA, because they have only one mode, but the DataFrame has two rows.

>>> df.mode()
  species  legs  wings
0    bird     2    0.0
1    <NA>  <NA>    2.0

Setting dropna=False, NA values are considered and they can be the mode (like for wings).

>>> df.mode(dropna=False)
  species  legs wings
0    bird     2  <NA>

Setting numeric_only=True, only the mode of numeric columns is computed, and columns of other types are ignored.

>>> df.mode(numeric_only=True)
   legs  wings
0     2    0.0
1  <NA>    2.0
mul(other, axis='columns', level=None, fill_value=None)

Get Multiplication of dataframe and other, element-wise (binary operator mul).

Equivalent to dataframe * other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmul.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
DataFrame

Result of the arithmetic operation.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> other = cudf.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> df * other
           angles degrees
circle          0    <NA>
triangle        9    <NA>
rectangle      16    <NA>
>>> df.mul(other, fill_value=0)
        angles  degrees
circle          0        0
triangle        9        0
rectangle      16        0
nans_to_nulls()

Convert nans (if any) to nulls.

property ndim

Dimension of the data. DataFrame ndim is always 2.

nlargest(n, columns, keep='first')

Get the rows of the DataFrame sorted by the n largest value of columns

Parameters
nint

Number of rows to return.

columnslabel or list of labels

Column label(s) to order by.

keep{‘first’, ‘last’}, default ‘first’

Where there are duplicate values:

  • first : prioritize the first occurrence(s)

  • last : prioritize the last occurrence(s)

Returns
DataFrame

The first n rows ordered by the given columns in descending order.

Notes

Difference from pandas:
  • Only a single column is supported in columns

Examples

>>> import cudf
>>> df = cudf.DataFrame({'population': [59000000, 65000000, 434000,
...                                   434000, 434000, 337000, 11300,
...                                   11300, 11300],
...                    'GDP': [1937894, 2583560 , 12011, 4520, 12128,
...                            17036, 182, 38, 311],
...                    'alpha-2': ["IT", "FR", "MT", "MV", "BN",
...                                "IS", "NR", "TV", "AI"]},
...                   index=["Italy", "France", "Malta",
...                          "Maldives", "Brunei", "Iceland",
...                          "Nauru", "Tuvalu", "Anguilla"])
>>> df
          population      GDP alpha-2
Italy       59000000  1937894      IT
France      65000000  2583560      FR
Malta         434000    12011      MT
Maldives      434000     4520      MV
Brunei        434000    12128      BN
Iceland       337000    17036      IS
Nauru          11300      182      NR
Tuvalu         11300       38      TV
Anguilla       11300      311      AI
>>> df.nlargest(3, 'population')
        population      GDP alpha-2
France    65000000  2583560      FR
Italy     59000000  1937894      IT
Malta       434000    12011      MT
>>> df.nlargest(3, 'population', keep='last')
        population      GDP alpha-2
France    65000000  2583560      FR
Italy     59000000  1937894      IT
Brunei      434000    12128      BN
notna()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
notnull()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
nsmallest(n, columns, keep='first')

Get the rows of the DataFrame sorted by the n smallest value of columns

Parameters
nint

Number of items to retrieve.

columnslist or str

Column name or names to order by.

keep{‘first’, ‘last’}, default ‘first’

Where there are duplicate values:

  • first : take the first occurrence.

  • last : take the last occurrence.

Returns
DataFrame

Notes

Difference from pandas:
  • Only a single column is supported in columns

Examples

>>> import cudf
>>> df = cudf.DataFrame({'population': [59000000, 65000000, 434000,
...                                   434000, 434000, 337000, 337000,
...                                   11300, 11300],
...                    'GDP': [1937894, 2583560 , 12011, 4520, 12128,
...                            17036, 182, 38, 311],
...                    'alpha-2': ["IT", "FR", "MT", "MV", "BN",
...                                "IS", "NR", "TV", "AI"]},
...                   index=["Italy", "France", "Malta",
...                          "Maldives", "Brunei", "Iceland",
...                          "Nauru", "Tuvalu", "Anguilla"])
>>> df
          population      GDP alpha-2
Italy       59000000  1937894      IT
France      65000000  2583560      FR
Malta         434000    12011      MT
Maldives      434000     4520      MV
Brunei        434000    12128      BN
Iceland       337000    17036      IS
Nauru         337000      182      NR
Tuvalu         11300       38      TV
Anguilla       11300      311      AI

In the following example, we will use nsmallest to select the three rows having the smallest values in column “population”.

>>> df.nsmallest(3, 'population')
          population    GDP alpha-2
Tuvalu         11300     38      TV
Anguilla       11300    311      AI
Iceland       337000  17036      IS

When using keep='last', ties are resolved in reverse order:

>>> df.nsmallest(3, 'population', keep='last')
          population  GDP alpha-2
Anguilla       11300  311      AI
Tuvalu         11300   38      TV
Nauru         337000  182      NR
one_hot_encoding(column, prefix, cats, prefix_sep='_', dtype='float64')

Expand a column with one-hot-encoding.

Parameters
columnstr

the source column with binary encoding for the data.

prefixstr

the new column name prefix.

catssequence of ints

the sequence of categories as integers.

prefix_sepstr

the separator between the prefix and the category.

dtype :

the dtype for the outputs; defaults to float64.

Returns
a new dataframe with new columns append for each category.

Examples

>>> import pandas as pd
>>> import cudf
>>> pet_owner = [1, 2, 3, 4, 5]
>>> pet_type = ['fish', 'dog', 'fish', 'bird', 'fish']
>>> df = pd.DataFrame({'pet_owner': pet_owner, 'pet_type': pet_type})
>>> df.pet_type = df.pet_type.astype('category')

Create a column with numerically encoded category values

>>> df['pet_codes'] = df.pet_type.cat.codes
>>> gdf = cudf.from_pandas(df)

Create the list of category codes to use in the encoding

>>> codes = gdf.pet_codes.unique()
>>> gdf.one_hot_encoding('pet_codes', 'pet_dummy', codes).head()
  pet_owner  pet_type  pet_codes  pet_dummy_0  pet_dummy_1  pet_dummy_2
0         1      fish          2          0.0          0.0          1.0
1         2       dog          1          0.0          1.0          0.0
2         3      fish          2          0.0          0.0          1.0
3         4      bird          0          1.0          0.0          0.0
4         5      fish          2          0.0          0.0          1.0
partition_by_hash(columns, nparts, keep_index=True)

Partition the dataframe by the hashed value of data in columns.

Parameters
columnssequence of str

The names of the columns to be hashed. Must have at least one name.

npartsint

Number of output partitions

keep_indexboolean

Whether to keep the index or drop it

Returns
partitioned: list of DataFrame
pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

Parameters
funcfunction

Function to apply to the Series/DataFrame/Index. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame/Index.

argsiterable, optional

Positional arguments passed into func.

kwargsmapping, optional

A dictionary of keyword arguments passed into func.

Returns
objectthe return type of func.

Examples

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. Instead of writing

>>> func(g(h(df), arg1=a), arg2=b, arg3=c)

You can write

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe(func, arg2=b, arg3=c)
... )

If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose f takes its data as arg2:

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe((func, 'arg2'), arg1=a, arg3=c)
...  )
pivot(index, columns, values=None)

Return reshaped DataFrame organized by the given index and column values.

Reshape data (produce a “pivot” table) based on column values. Uses unique values from specified index / columns to form axes of the resulting DataFrame.

Parameters
indexcolumn name, optional

Column used to construct the index of the result.

columnscolumn name, optional

Column used to construct the columns of the result.

valuescolumn name or list of column names, optional

Column(s) whose values are rearranged to produce the result. If not specified, all remaining columns of the DataFrame are used.

Returns
DataFrame

Examples

>>> a = cudf.DataFrame()
>>> a['a'] = [1, 1, 2, 2],
>>> a['b'] = ['a', 'b', 'a', 'b']
>>> a['c'] = [1, 2, 3, 4]
>>> a.pivot(index='a', columns='b')
   c
b  a  b
a
1  1  2
2  3  4

Pivot with missing values in result:

>>> a = cudf.DataFrame()
>>> a['a'] = [1, 1, 2]
>>> a['b'] = [1, 2, 3]
>>> a['c'] = ['one', 'two', 'three']
>>> a.pivot(index='a', columns='b')
          c
    b     1     2      3
    a
    1   one   two   <NA>
    2  <NA>  <NA>  three
pop(item)

Return a column and drop it from the DataFrame.

pow(other, axis='columns', level=None, fill_value=None)

Get Exponential power of dataframe and other, element-wise (binary operator pow).

Equivalent to dataframe ** other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rpow.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
DataFrame

Result of the arithmetic operation.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'angles': [1, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df ** 2
           angles  degrees
circle          0   129600
triangle        9    32400
rectangle      16   129600
>>> df.pow(2)
           angles  degrees
circle          0   129600
triangle        9    32400
rectangle      16   129600
prod(axis=None, skipna=None, dtype=None, level=None, numeric_only=None, min_count=0, **kwargs)

Return product of the values in the DataFrame.

Parameters
axis: {index (0), columns(1)}

Axis for the function to be applied on.

skipna: bool, default True

Exclude NA/null values when computing the result.

dtype: data type

Data type to cast the result to.

min_count: int, default 0

The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

The default being 0. This means the sum of an all-NA or empty Series is 0, and the product of an all-NA or empty Series is 1.

Returns
scalar

Notes

Parameters currently not supported are level, numeric_only.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]})
>>> df.prod()
a      24
b    5040
dtype: int64
product(axis=None, skipna=None, dtype=None, level=None, numeric_only=None, min_count=0, **kwargs)

Return product of the values in the DataFrame.

Parameters
axis: {index (0), columns(1)}

Axis for the function to be applied on.

skipna: bool, default True

Exclude NA/null values when computing the result.

dtype: data type

Data type to cast the result to.

min_count: int, default 0

The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

The default being 0. This means the sum of an all-NA or empty Series is 0, and the product of an all-NA or empty Series is 1.

Returns
Series

Notes

Parameters currently not supported are level`, numeric_only.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]})
>>> df.product()
a      24
b    5040
dtype: int64
quantile(q=0.5, axis=0, numeric_only=True, interpolation='linear', columns=None, exact=True)

Return values at the given quantile.

Parameters
qfloat or array-like

0 <= q <= 1, the quantile(s) to compute

axisint

axis is a NON-FUNCTIONAL parameter

numeric_onlybool, default True

If False, the quantile of datetime and timedelta data will be computed as well.

interpolation{linear, lower, higher, midpoint, nearest}

This parameter specifies the interpolation method to use, when the desired quantile lies between two data points i and j. Default linear.

columnslist of str

List of column names to include.

exactboolean

Whether to use approximate or exact quantile algorithm.

Returns
Series or DataFrame

If q is an array or numeric_only is set to False, a DataFrame will be returned where index is q, the columns are the columns of self, and the values are the quantile.

If q is a float, a Series will be returned where the index is the columns of self and the values are the quantiles.

Notes

One notable difference from Pandas is when DataFrame is of non-numeric types and result is expected to be a Series in case of Pandas. cuDF will return a DataFrame as it doesn’t support mixed types under Series.

Examples

>>> import cupy as cp
>>> import cudf
>>> df = cudf.DataFrame(cp.array([[1, 1], [2, 10], [3, 100], [4, 100]]),
...                   columns=['a', 'b'])
>>> df
   a    b
0  1    1
1  2   10
2  3  100
3  4  100
>>> df.quantile(0.1)
a    1.3
b    3.7
Name: 0.1, dtype: float64
>>> df.quantile([.1, .5])
    a     b
0.1  1.3   3.7
0.5  2.5  55.0
quantiles(q=0.5, interpolation='nearest')

Return values at the given quantile.

Parameters
qfloat or array-like

0 <= q <= 1, the quantile(s) to compute

interpolation{lower, higher, nearest}

This parameter specifies the interpolation method to use, when the desired quantile lies between two data points i and j. Default ‘nearest’.

Returns
DataFrame
query(expr, local_dict=None)

Query with a boolean expression using Numba to compile a GPU kernel.

See pandas.DataFrame.query.

Parameters
exprstr

A boolean expression. Names in expression refer to columns. index can be used instead of index name, but this is not supported for MultiIndex.

Names starting with @ refer to Python variables.

An output value will be null if any of the input values are null regardless of expression.

local_dictdict

Containing the local variable to be used in query.

Returns
filteredDataFrame

Examples

>>> import cudf
>>> a = ('a', [1, 2, 2])
>>> b = ('b', [3, 4, 5])
>>> df = cudf.DataFrame([a, b])
>>> expr = "(a == 2 and b == 4) or (b == 3)"
>>> df.query(expr)
   a  b
0  1  3
1  2  4

DateTime conditionals:

>>> import numpy as np
>>> import datetime
>>> df = cudf.DataFrame()
>>> data = np.array(['2018-10-07', '2018-10-08'], dtype='datetime64')
>>> df['datetimes'] = data
>>> search_date = datetime.datetime.strptime('2018-10-08', '%Y-%m-%d')
>>> df.query('datetimes==@search_date')
                datetimes
1 2018-10-08T00:00:00.000

Using local_dict:

>>> import numpy as np
>>> import datetime
>>> df = cudf.DataFrame()
>>> data = np.array(['2018-10-07', '2018-10-08'], dtype='datetime64')
>>> df['datetimes'] = data
>>> search_date2 = datetime.datetime.strptime('2018-10-08', '%Y-%m-%d')
>>> df.query('datetimes==@search_date',
...         local_dict={'search_date':search_date2})
                datetimes
1 2018-10-08T00:00:00.000
radd(other, axis=1, level=None, fill_value=None)

Get Addition of dataframe and other, element-wise (binary operator radd).

Equivalent to other + dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, add.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
DataFrame

Result of the arithmetic operation.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df + 1
        angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.radd(1)
        angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)

Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.

Parameters
axis{0 or ‘index’, 1 or ‘columns’}, default 0

Index to direct ranking.

method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’

How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.

numeric_onlybool, optional

For DataFrame objects, rank only numeric columns if set to True.

na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’

How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.

ascendingbool, default True

Whether or not the elements should be ranked in ascending order.

pctbool, default False

Whether or not to display the returned rankings in percentile form.

Returns
same type as caller

Return a Series or DataFrame with data ranks as values.

rdiv(other, axis='columns', level=None, fill_value=None)

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

Equivalent to other / dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, truediv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
DataFrame

Result of the arithmetic operation.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360
>>> df.rtruediv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778
>>> 10 / df
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778
reindex(labels=None, axis=0, index=None, columns=None, copy=True)

Return a new DataFrame whose axes conform to a new index

DataFrame.reindex supports two calling conventions:
  • (index=index_labels, columns=column_names)

  • (labels, axis={0 or 'index', 1 or 'columns'})

Parameters
labelsIndex, Series-convertible, optional, default None
axis{0 or ‘index’, 1 or ‘columns’}, optional, default 0
indexIndex, Series-convertible, optional, default None

Shorthand for df.reindex(labels=index_labels, axis=0)

columnsarray-like, optional, default None

Shorthand for df.reindex(labels=column_names, axis=1)

copyboolean, optional, default True
Returns
A DataFrame whose axes conform to the new index(es)

Examples

>>> import cudf
>>> df = cudf.DataFrame()
>>> df['key'] = [0, 1, 2, 3, 4]
>>> df['val'] = [float(i + 10) for i in range(5)]
>>> df_new = df.reindex(index=[0, 3, 4, 5],
...                     columns=['key', 'val', 'sum'])
>>> df
   key   val
0    0  10.0
1    1  11.0
2    2  12.0
3    3  13.0
4    4  14.0
>>> df_new
   key   val  sum
0    0  10.0  NaN
3    3  13.0  NaN
4    4  14.0  NaN
5   -1   NaN  NaN
rename(mapper=None, index=None, columns=None, axis=0, copy=True, inplace=False, level=None, errors='ignore')

Alter column and index labels.

Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is. Extra labels listed don’t throw an error.

DataFrame.rename supports two calling conventions:
  • (index=index_mapper, columns=columns_mapper, ...)

  • (mapper, axis={0/'index' or 1/'column'}, ...)

We highly recommend using keyword arguments to clarify your intent.

Parameters
mapperdict-like or function, default None

optional dict-like or functions transformations to apply to the index/column values depending on selected axis.

indexdict-like, default None

Optional dict-like transformations to apply to the index axis’ values. Does not support functions for axis 0 yet.

columnsdict-like or function, default None

optional dict-like or functions transformations to apply to the columns axis’ values.

axisint, default 0

Axis to rename with mapper. 0 or ‘index’ for index 1 or ‘columns’ for columns

copyboolean, default True

Also copy underlying data

inplaceboolean, default False

Return new DataFrame. If True, assign columns without copy

levelint or level name, default None

In case of a MultiIndex, only rename labels in the specified level.

errors{‘raise’, ‘ignore’, ‘warn’}, default ‘ignore’

Only ‘ignore’ supported Control raising of exceptions on invalid data for provided dtype.

  • raise : allow exceptions to be raised

  • ignore : suppress exceptions. On error return original object.

  • warn : prints last exceptions as warnings and return original object.

Returns
DataFrame

Notes

Difference from pandas:
  • Not supporting: level

Rename will not overwite column names. If a list with duplicates is passed, column names will be postfixed with a number.

Examples

>>> import cudf
>>> df = cudf.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
>>> df
   A  B
0  1  4
1  2  5
2  3  6

Rename columns using a mapping:

>>> df.rename(columns={"A": "a", "B": "c"})
   a  c
0  1  4
1  2  5
2  3  6

Rename index using a mapping:

>>> df.rename(index={0: 10, 1: 20, 2: 30})
    A  B
10  1  4
20  2  5
30  3  6
repeat(repeats, axis=None)

Repeats elements consecutively.

Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.

Parameters
repeatsint, or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.

Returns
Series/DataFrame/Index

A newly created object of same type as caller with repeated elements.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]})
>>> df
   a   b
0  1  10
1  2  20
2  3  30
>>> df.repeat(3)
   a   b
0  1  10
0  1  10
0  1  10
1  2  20
1  2  20
1  2  20
2  3  30
2  3  30
2  3  30

Repeat on Series

>>> s = cudf.Series([0, 2])
>>> s
0    0
1    2
dtype: int64
>>> s.repeat([3, 4])
0    0
0    0
0    0
1    2
1    2
1    2
1    2
dtype: int64
>>> s.repeat(2)
0    0
0    0
1    2
1    2
dtype: int64

Repeat on Index

>>> index = cudf.Index([10, 22, 33, 55])
>>> index
Int64Index([10, 22, 33, 55], dtype='int64')
>>> index.repeat(5)
Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33,
            33, 33, 33, 33, 55, 55, 55, 55, 55],
        dtype='int64')
replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method=None)

Replace values given in to_replace with replacement.

Parameters
to_replacenumeric, str, list-like or dict

Value(s) that will be replaced.

  • numeric or str:
    • values equal to to_replace will be replaced with replacement

  • list of numeric or str:
    • If replacement is also list-like, to_replace and replacement must be of same length.

  • dict:
    • Dicts can be used to replace different values in different columns. For example, {‘a’: 1, ‘z’: 2} specifies that the value 1 in column a and the value 2 in column z should be replaced with replacement*.

    • Dicts can be used to specify different replacement values for different existing values. For example, {‘a’: ‘b’, ‘y’: ‘z’} replaces the value ‘a’ with ‘b’ and ‘y’ with ‘z’. To use a dict in this way the value parameter should be None.

valuenumeric, str, list-like, or dict

Value(s) to replace to_replace with. If a dict is provided, then its keys must match the keys in to_replace, and corresponding values must be compatible (e.g., if they are lists, then they must match in length).

inplacebool, default False

If True, in place.

Returns
resultDataFrame

DataFrame after replacement.

Raises
TypeError
  • If to_replace is not a scalar, array-like, dict, or None

  • If to_replace is a dict and value is not a list, dict, or Series

ValueError
  • If a list is passed to to_replace and value but they are not the same length.

Notes

Parameters that are currently not supported are: limit, regex, method

Examples

Scalar to_replace and value

>>> import cudf
>>> df = cudf.DataFrame({'A': [0, 1, 2, 3, 4],
...                    'B': [5, 6, 7, 8, 9],
...                    'C': ['a', 'b', 'c', 'd', 'e']})
>>> df
   A  B  C
0  0  5  a
1  1  6  b
2  2  7  c
3  3  8  d
4  4  9  e
>>> df.replace(0, 5)
   A  B  C
0  5  5  a
1  1  6  b
2  2  7  c
3  3  8  d
4  4  9  e

List-like to_replace

>>> df.replace([0, 1, 2, 3], 4)
   A  B  C
0  4  5  a
1  4  6  b
2  4  7  c
3  4  8  d
4  4  9  e
>>> df.replace([0, 1, 2, 3], [4, 3, 2, 1])
   A  B  C
0  4  5  a
1  3  6  b
2  2  7  c
3  1  8  d
4  4  9  e

dict-like to_replace

>>> df.replace({0: 10, 1: 100})
     A  B  C
0   10  5  a
1  100  6  b
2    2  7  c
3    3  8  d
4    4  9  e
>>> df.replace({'A': 0, 'B': 5}, 100)
     A    B  C
0  100  100  a
1    1    6  b
2    2    7  c
3    3    8  d
4    4    9  e
reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')

Reset the index.

Reset the index of the DataFrame, and use the default one instead.

Parameters
dropbool, default False

Do not try to insert index into dataframe columns. This resets the index to the default integer index.

inplacebool, default False

Modify the DataFrame in place (do not create a new object).

Returns
DataFrame or None

DataFrame with the new index or None if inplace=True.

Examples

>>> df = cudf.DataFrame([('bird', 389.0),
...                    ('bird', 24.0),
...                    ('mammal', 80.5),
...                    ('mammal', np.nan)],
...                   index=['falcon', 'parrot', 'lion', 'monkey'],
...                   columns=('class', 'max_speed'))
>>> df
         class max_speed
falcon    bird     389.0
parrot    bird      24.0
lion    mammal      80.5
monkey  mammal      <NA>
>>> df.reset_index()
    index   class max_speed
0  falcon    bird     389.0
1  parrot    bird      24.0
2    lion  mammal      80.5
3  monkey  mammal      <NA>
>>> df.reset_index(drop=True)
    class max_speed
0    bird     389.0
1    bird      24.0
2  mammal      80.5
3  mammal      <NA>
rfloordiv(other, axis='columns', level=None, fill_value=None)

Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).

Equivalent to other // dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, floordiv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
DataFrame

Result of the arithmetic operation.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'col1': [10, 11, 23],
... 'col2': [101, 122, 321]})
>>> df
   col1  col2
0    10   101
1    11   122
2    23   321
>>> df.rfloordiv(df)
   col1  col2
0     1     1
1     1     1
2     1     1
>>> df.rfloordiv(200)
   col1  col2
0    20     1
1    18     1
2     8     0
>>> df.rfloordiv(100)
   col1  col2
0    10     0
1     9     0
2     4     0
rmod(other, axis='columns', level=None, fill_value=None)

Get Modulo division of dataframe and other, element-wise (binary operator rmod).

Equivalent to other % dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, mod.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
DataFrame

Result of the arithmetic operation.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'angles': [1, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> 100 % df
           angles  degrees
circle          0      100
triangle        1      100
rectangle       0      100
>>> df.rmod(100)
           angles  degrees
circle          0      100
triangle        1      100
rectangle       0      100
rmul(other, axis='columns', level=None, fill_value=None)

Get Multiplication of dataframe and other, element-wise (binary operator rmul).

Equivalent to other * dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, mul.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
DataFrame

Result of the arithmetic operation.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> other = cudf.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other * df
           angles degrees
circle          0    <NA>
triangle        9    <NA>
rectangle      16    <NA>
>>> df.rmul(other, fill_value=0)
           angles  degrees
circle          0        0
triangle        9        0
rectangle      16        0
round(decimals=0)

Round a DataFrame to a variable number of decimal places.

Parameters
decimalsint, dict, Series

Number of decimal places to round each column to. If an int is given, round each column to the same number of places. Otherwise dict and Series round to variable numbers of places. Column names should be in the keys if decimals is a dict-like, or in the index if decimals is a Series. Any columns not included in decimals will be left as is. Elements of decimals which are not columns of the input will be ignored.

Returns
DataFrame

A DataFrame with the affected columns rounded to the specified number of decimal places.

Examples

>>> df = cudf.DataFrame(
        [(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
...     columns=['dogs', 'cats']
... )
>>> df
    dogs  cats
0  0.21  0.32
1  0.01  0.67
2  0.66  0.03
3  0.21  0.18

By providing an integer each column is rounded to the same number of decimal places

>>> df.round(1)
    dogs  cats
0   0.2   0.3
1   0.0   0.7
2   0.7   0.0
3   0.2   0.2

With a dict, the number of places for specific columns can be specified with the column names as key and the number of decimal places as value

>>> df.round({'dogs': 1, 'cats': 0})
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

Using a Series, the number of places for specific columns can be specified with the column names as index and the number of decimal places as value

>>> decimals = cudf.Series([0, 1], index=['cats', 'dogs'])
>>> df.round(decimals)
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0
rpow(other, axis='columns', level=None, fill_value=None)

Get Exponential power of dataframe and other, element-wise (binary operator pow).

Equivalent to other ** dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, pow.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
DataFrame

Result of the arithmetic operation.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'angles': [1, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> 1 ** df
           angles  degrees
circle          1        1
triangle        1        1
rectangle       1        1
>>> df.rpow(1)
           angles  degrees
circle          1        1
triangle        1        1
rectangle       1        1
rsub(other, axis='columns', level=None, fill_value=None)

Get Subtraction of dataframe and other, element-wise (binary operator rsub).

Equivalent to other - dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, sub.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
DataFrame

Result of the arithmetic operation.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360
>>> df.rsub(1)
           angles  degrees
circle          1     -359
triangle       -2     -179
rectangle      -3     -359
>>> df.rsub([1, 2])
           angles  degrees
circle          1     -358
triangle       -2     -178
rectangle      -3     -358
rtruediv(other, axis='columns', level=None, fill_value=None)

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

Equivalent to other / dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, truediv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
DataFrame

Result of the arithmetic operation.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360
>>> df.rtruediv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778
>>> 10 / df
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters
nint, optional

Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

fracfloat, optional

Fraction of axis items to return. Cannot be used with n.

replacebool, default False

Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”

weightsstr or ndarray-like, optional

Only supported for axis=1/”columns”

random_stateint, numpy RandomState or None, default None

Seed for the random number generator (if int), or None. If None, a random seed will be chosen. if RandomState, seed will be extracted from current state.

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.

Returns
Series or DataFrame or Index

A new object of same type as caller containing n items randomly sampled from the caller object.

Examples

>>> import cudf as cudf
>>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}})
>>> df.sample(3)
   a
1  2
3  4
0  1
>>> sr = cudf.Series([1, 2, 3, 4, 5])
>>> sr.sample(10, replace=True)
1    4
3    1
2    4
0    5
0    1
4    5
4    1
0    2
0    3
3    2
dtype: int64
>>> df = cudf.DataFrame(
... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]})
>>> df.sample(2, axis=1)
   a  c
0  1  3
1  2  4
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)

Scatter to a list of dataframes.

Uses map_index to determine the destination of each row of the original DataFrame.

Parameters
map_indexSeries, str or list-like

Scatter assignment for each row

map_sizeint

Length of output list. Must be >= uniques in map_index

keep_indexbool

Conserve original index values for each row

Returns
A list of cudf.DataFrame objects.
searchsorted(values, side='left', ascending=True, na_position='last')

Find indices where elements should be inserted to maintain order

Parameters
valueFrame (Shape must be consistent with self)

Values to be hypothetically inserted into Self

sidestr {‘left’, ‘right’} optional, default ‘left‘

If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index

ascendingbool optional, default True

Sorted Frame is in ascending order (otherwise descending)

na_positionstr {‘last’, ‘first’} optional, default ‘last‘

Position of null values in sorted order

Returns
1-D cupy array of insertion points

Examples

>>> s = cudf.Series([1, 2, 3])
>>> s.searchsorted(4)
3
>>> s.searchsorted([0, 4])
array([0, 3], dtype=int32)
>>> s.searchsorted([1, 3], side='left')
array([0, 2], dtype=int32)
>>> s.searchsorted([1, 3], side='right')
array([1, 3], dtype=int32)

If the values are not monotonically sorted, wrong locations may be returned:

>>> s = cudf.Series([2, 1, 3])
>>> s.searchsorted(1)
0   # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]})
>>> df
   a   b
0  1  10
1  3  12
2  5  14
3  7  16
>>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6],
... 'b': [10, 11, 13, 15]})
>>> values_df
   a   b
0  0  10
1  2  17
2  5  13
3  6  15
>>> df.searchsorted(values_df, ascending=False)
array([4, 4, 4, 0], dtype=int32)
select_dtypes(include=None, exclude=None)

Return a subset of the DataFrame’s columns based on the column dtypes.

Parameters
includestr or list

which columns to include based on dtypes

excludestr or list

which columns to exclude based on dtypes

Returns
DataFrame

The subset of the frame including the dtypes in include and excluding the dtypes in exclude.

Raises
ValueError
  • If both of include and exclude are empty

  • If include and exclude have overlapping elements

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2] * 3,
...                    'b': [True, False] * 3,
...                    'c': [1.0, 2.0] * 3})
>>> df
   a      b    c
0  1   True  1.0
1  2  False  2.0
2  1   True  1.0
3  2  False  2.0
4  1   True  1.0
5  2  False  2.0
>>> df.select_dtypes(include='bool')
       b
0   True
1  False
2   True
3  False
4   True
5  False
>>> df.select_dtypes(include=['float64'])
     c
0  1.0
1  2.0
2  1.0
3  2.0
4  1.0
5  2.0
>>> df.select_dtypes(exclude=['int'])
       b    c
0   True  1.0
1  False  2.0
2   True  1.0
3  False  2.0
4   True  1.0
5  False  2.0
set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)

Return a new DataFrame with a new index

Parameters
keysIndex, Series-convertible, label-like, or list

Index : the new index. Series-convertible : values for the new index. Label-like : Label of column to be used as index. List : List of items from above.

dropboolean, default True

Whether to drop corresponding column for str index argument

appendboolean, default True

Whether to append columns to the existing index, resulting in a MultiIndex.

inplaceboolean, default False

Modify the DataFrame in place (do not create a new object).

verify_integrityboolean, default False

Check for duplicates in the new index.

Examples

>>> df = cudf.DataFrame({
...     "a": [1, 2, 3, 4, 5],
...     "b": ["a", "b", "c", "d","e"],
...     "c": [1.0, 2.0, 3.0, 4.0, 5.0]
... })
>>> df
   a  b    c
0  1  a  1.0
1  2  b  2.0
2  3  c  3.0
3  4  d  4.0
4  5  e  5.0

Set the index to become the ‘b’ column:

>>> df.set_index('b')
   a    c
b
a  1  1.0
b  2  2.0
c  3  3.0
d  4  4.0
e  5  5.0

Create a MultiIndex using columns ‘a’ and ‘b’:

>>> df.set_index(["a", "b"])
       c
a b
1 a  1.0
2 b  2.0
3 c  3.0
4 d  4.0
5 e  5.0

Set new Index instance as index:

>>> df.set_index(cudf.RangeIndex(10, 15))
    a  b    c
10  1  a  1.0
11  2  b  2.0
12  3  c  3.0
13  4  d  4.0
14  5  e  5.0

Setting append=True will combine current index with column a:

>>> df.set_index("a", append=True)
     b    c
  a
0 1  a  1.0
1 2  b  2.0
2 3  c  3.0
3 4  d  4.0
4 5  e  5.0

set_index supports inplace parameter too:

>>> df.set_index("a", inplace=True)
>>> df
   b    c
a
1  a  1.0
2  b  2.0
3  c  3.0
4  d  4.0
5  e  5.0
property shape

Returns a tuple representing the dimensionality of the DataFrame.

shift(periods=1, freq=None, axis=0, fill_value=None)

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.sin()
0    0.000000
1    0.318683
2    0.479426
3    0.850904
4    0.893997
5   -0.801153
6    0.958916
dtype: float64

sin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.sin()
      first    second
0  0.000000 -0.506366
1 -0.958924  0.958916
2 -0.544021 -0.544072
3  0.650288 -0.999756

sin operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.sin()
Float64Index([-0.3894183423086505, -0.5063656411097588,
            0.8011526357338306, 0.8939966636005579],
            dtype='float64')
property size

Return the number of elements in the underlying data.

Returns
sizeSize of the DataFrame / Index / Series / MultiIndex

Examples

Size of an empty dataframe is 0.

>>> import cudf
>>> df = cudf.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>> df.size
0
>>> df = cudf.DataFrame(index=[1, 2, 3])
>>> df
Empty DataFrame
Columns: []
Index: [1, 2, 3]
>>> df.size
0

DataFrame with values

>>> df = cudf.DataFrame({'a': [10, 11, 12],
...         'b': ['hello', 'rapids', 'ai']})
>>> df
    a       b
0  10   hello
1  11  rapids
2  12      ai
>>> df.size
6
>>> df.index
RangeIndex(start=0, stop=3)
>>> df.index.size
3

Size of an Index

>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.size
0
>>> index = cudf.Index([1, 2, 3, 10])
>>> index
Int64Index([1, 2, 3, 10], dtype='int64')
>>> index.size
4

Size of a MultiIndex

>>> midx = cudf.MultiIndex(
...                 levels=[["a", "b", "c", None], ["1", None, "5"]],
...                 codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...                 names=["x", "y"],
...             )
>>> midx
MultiIndex([( 'a',  '1'),
            ( 'a',  '5'),
            ( 'b', <NA>),
            ( 'c', <NA>),
            (<NA>,  '1')],
           names=['x', 'y'])
>>> midx.size
5
skew(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased Fisher-Pearson skew of a sample.

Parameters
skipna: bool, default True

Exclude NA/null values when computing the result.

Returns
Series

Notes

Parameters currently not supported are axis, level and numeric_only

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [3, 2, 3, 4], 'b': [7, 8, 10, 10]})
>>> df.skew()
a    0.00000
b   -0.37037
dtype: float64
sort_index(axis=0, level=None, ascending=True, inplace=False, kind=None, na_position='last', sort_remaining=True, ignore_index=False)

Sort object by labels (along an axis).

Parameters
axis{0 or ‘index’, 1 or ‘columns’}, default 0

The axis along which to sort. The value 0 identifies the rows, and 1 identifies the columns.

levelint or level name or list of ints or list of level names

If not None, sort on values in specified index level(s). This is only useful in the case of MultiIndex.

ascendingbool, default True

Sort ascending vs. descending.

inplacebool, default False

If True, perform operation in-place.

kindsorting method such as quick sort and others.

Not yet supported.

na_position{‘first’, ‘last’}, default ‘last’

Puts NaNs at the beginning if first; last puts NaNs at the end.

sort_remainingbool, default True

Not yet supported

ignore_indexbool, default False

if True, index will be replaced with RangeIndex.

Returns
DataFrame or None

Examples

>>> df = cudf.DataFrame(
... {"b":[3, 2, 1], "a":[2, 1, 3]}, index=[1, 3, 2])
>>> df.sort_index(axis=0)
   b  a
1  3  2
2  1  3
3  2  1
>>> df.sort_index(axis=1)
   a  b
1  2  3
3  1  2
2  3  1
sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False)

Sort by the values row-wise.

Parameters
bystr or list of str

Name or list of names to sort by.

ascendingbool or list of bool, default True

Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the by.

na_position{‘first’, ‘last’}, default ‘last’

‘first’ puts nulls at the beginning, ‘last’ puts nulls at the end

ignore_indexbool, default False

If True, index will not be sorted.

Returns
sorted_objcuDF DataFrame

Notes

Difference from pandas:
  • Support axis=’index’ only.

  • Not supporting: inplace, kind

Examples

>>> import cudf
>>> a = ('a', [0, 1, 2])
>>> b = ('b', [-3, 2, 0])
>>> df = cudf.DataFrame([a, b])
>>> df.sort_values('b')
   a  b
0  0 -3
2  2  0
1  1  2
sqrt()

Get the non-negative square-root of all elements, element-wise.

Returns
DataFrame/Series/Index

Result of the non-negative square-root of each element.

Examples

>>> import cudf
>>> import cudf
>>> ser = cudf.Series([10, 25, 81, 1.0, 100])
>>> ser
0     10.0
1     25.0
2     81.0
3      1.0
4    100.0
dtype: float64
>>> ser.sqrt()
0     3.162278
1     5.000000
2     9.000000
3     1.000000
4    10.000000
dtype: float64

sqrt operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-10.0, 100, 625],
...                      'second': [1, 2, 0.4]})
>>> df
   first  second
0  -10.0     1.0
1  100.0     2.0
2  625.0     0.4
>>> df.sqrt()
   first    second
0    NaN  1.000000
1   10.0  1.414214
2   25.0  0.632456

sqrt operation on Index:

>>> index = cudf.Index([-10.0, 100, 625])
>>> index
Float64Index([-10.0, 100.0, 625.0], dtype='float64')
>>> index.sqrt()
Float64Index([nan, 10.0, 25.0], dtype='float64')
stack(level=- 1, dropna=True)

Stack the prescribed level(s) from columns to index

Return a reshaped Series

Parameters
dropnabool, default True

Whether to drop rows in the resulting Series with missing values.

Returns
The stacked cudf.Series

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a':[0,1,3], 'b':[1,2,4]})
>>> df.stack()
0  a    0
   b    1
1  a    1
   b    2
2  a    3
   b    4
dtype: int64
std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)

Return sample standard deviation of the DataFrame.

Normalized by N-1 by default. This can be changed using the ddof argument

Parameters
axis: {index (0), columns(1)}

Axis for the function to be applied on.

skipna: bool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

ddof: int, default 1

Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

Returns
Series

Notes

Parameters currently not supported are level and numeric_only

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]})
>>> df.std()
a    1.290994
b    1.290994
dtype: float64
sub(other, axis='columns', level=None, fill_value=None)

Get Subtraction of dataframe and other, element-wise (binary operator sub).

Equivalent to dataframe - other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rsub.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
DataFrame

Result of the arithmetic operation.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df.sub(1)
        angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359
>>> df.sub([1, 2])
        angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
sum(axis=None, skipna=None, dtype=None, level=None, numeric_only=None, min_count=0, **kwargs)

Return sum of the values in the DataFrame.

Parameters
axis: {index (0), columns(1)}

Axis for the function to be applied on.

skipna: bool, default True

Exclude NA/null values when computing the result.

dtype: data type

Data type to cast the result to.

min_count: int, default 0

The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

The default being 0. This means the sum of an all-NA or empty Series is 0, and the product of an all-NA or empty Series is 1.

Returns
Series

Notes

Parameters currently not supported are level, numeric_only.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]})
>>> df.sum()
a    10
b    34
dtype: int64
tail(n=5)

Returns the last n rows as a new DataFrame

Examples

>>> import cudf
>>> df = cudf.DataFrame()
>>> df['key'] = [0, 1, 2, 3, 4]
>>> df['val'] = [float(i + 10) for i in range(5)]  # insert column
>>> df.tail(2)
   key   val
3    3  13.0
4    4  14.0
take(positions, keep_index=True)

Return a new DataFrame containing the rows specified by positions

Parameters
positionsarray-like

Integer or boolean array-like specifying the rows of the output. If integer, each element represents the integer index of a row. If boolean, positions must be of the same length as self, and represents a boolean mask.

Returns
outDataFrame

New DataFrame

Examples

>>> a = cudf.DataFrame({'a': [1.0, 2.0, 3.0],
...                    'b': cudf.Series(['a', 'b', 'c'])})
>>> a.take([0, 2, 2])
     a  b
0  1.0  a
2  3.0  c
2  3.0  c
>>> a.take([True, False, True])
     a  b
0  1.0  a
2  3.0  c
tan()

Get Trigonometric tangent, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.tan()
0    0.000000
1    0.336213
2    0.546302
3    1.619775
4   -1.995200
5    1.338690
6   -3.380140
dtype: float64

tan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.tan()
      first     second
0  0.000000  -0.587214
1 -3.380515  -3.380140
2  0.648361   0.648446
3 -0.855993  45.244742

tan operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.tan()
Float64Index([-0.4227932187381618,  -0.587213915156929,
            -1.3386902103511544, -1.995200412208242],
            dtype='float64')
tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

Parameters
selfinput Table containing columns to interleave.
countNumber of times to tile “rows”. Must be non-negative.
Returns
The table containing the tiled “rows”.

Examples

>>> df  = Dataframe([[8, 4, 7], [5, 2, 3]])
>>> count = 2
>>> df.tile(df, count)
   0  1  2
0  8  4  7
1  5  2  3
0  8  4  7
1  5  2  3
to_arrow(preserve_index=True)

Convert to a PyArrow Table.

Parameters
preserve_indexbool, default True

whether index column and its meta data needs to be saved or not

Returns
PyArrow Table

Examples

>>> import cudf
>>> df = cudf.DataFrame(
...     {"a":[1, 2, 3], "b":[4, 5, 6]}, index=[1, 2, 3])
>>> df.to_arrow()
pyarrow.Table
a: int64
b: int64
index: int64
>>> df.to_arrow(preserve_index=False)
pyarrow.Table
a: int64
b: int64
to_csv(path_or_buf=None, sep=',', na_rep='', columns=None, header=True, index=True, line_terminator='\n', chunksize=None, encoding=None, compression=None, **kwargs)

Write a dataframe to csv file format.

Parameters
path_or_bufstr or file handle, default None

File path or object, if None is provided the result is returned as a string.

sepchar, default ‘,’

Delimiter to be used.

na_repstr, default ‘’

String to use for null entries

columnslist of str, optional

Columns to write

headerbool, default True

Write out the column names

indexbool, default True

Write out the index as a column

line_terminatorchar, default ‘n’
chunksizeint or None, default None

Rows to write at a time

encoding: str, default ‘utf-8’

A string representing the encoding to use in the output file Only ‘utf-8’ is currently supported

compression: str, None

A string representing the compression scheme to use in the the output file Compression while writing csv is not supported currently

Returns
——-
None or str

If path_or_buf is None, returns the resulting csv format as a string. Otherwise returns None.

Notes

  • Follows the standard of Pandas csv.QUOTE_NONNUMERIC for all output.

  • If to_csv leads to memory errors consider setting the chunksize argument.

Examples

Write a dataframe to csv.

>>> import cudf
>>> filename = 'foo.csv'
>>> df = cudf.DataFrame({'x': [0, 1, 2, 3],
                         'y': [1.0, 3.3, 2.2, 4.4],
                         'z': ['a', 'b', 'c', 'd']})
>>> df = df.set_index([3, 2, 1, 0])
>>> df.to_csv(filename)
to_dlpack()

Converts a cuDF object into a DLPack tensor.

DLPack is an open-source memory tensor structure: dmlc/dlpack.

This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.

Parameters
cudf_objDataFrame, Series, Index, or Column
Returns
pycapsule_objPyCapsule

Output DLPack tensor pointer which is encapsulated in a PyCapsule object.

to_feather(path, *args, **kwargs)

Write a DataFrame to the feather format.

Parameters
pathstr

File path

to_hdf(path_or_buf, key, *args, **kwargs)

Write the contained data to an HDF5 file using HDFStore.

Hierarchical Data Format (HDF) is self-describing, allowing an application to interpret the structure and contents of a file with no outside information. One HDF file can hold a mix of related objects which can be accessed as a group or as individual objects.

In order to add another DataFrame or Series to an existing HDF file please use append mode and a different a key.

For more information see the user guide.

Parameters
path_or_bufstr or pandas.HDFStore

File path or HDFStore object.

keystr

Identifier for the group in the store.

mode{‘a’, ‘w’, ‘r+’}, default ‘a’

Mode to open file:

  • ‘w’: write, a new file is created (an existing file with the same name would be deleted).

  • ‘a’: append, an existing file is opened for reading and writing, and if the file does not exist it is created.

  • ‘r+’: similar to ‘a’, but the file must already exist.

format{‘fixed’, ‘table’}, default ‘fixed’

Possible values:

  • ‘fixed’: Fixed format. Fast writing/reading. Not-appendable, nor searchable.

  • ‘table’: Table format. Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data.

appendbool, default False

For Table formats, append the input data to the existing.

data_columnslist of columns or True, optional

List of columns to create as indexed data columns for on-disk queries, or True to use all columns. By default only the axes of the object are indexed. See Query via Data Columns. Applicable only to format=’table’.

complevel{0-9}, optional

Specifies a compression level for data. A value of 0 disables compression.

complib{‘zlib’, ‘lzo’, ‘bzip2’, ‘blosc’}, default ‘zlib’

Specifies the compression library to be used. As of v0.20.2 these additional compressors for Blosc are supported (default if no compressor specified: ‘blosc:blosclz’): {‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’}. Specifying a compression library which is not available issues a ValueError.

fletcher32bool, default False

If applying compression use the fletcher32 checksum.

dropnabool, default False

If true, ALL nan rows will not be written to store.

errorsstr, default ‘strict’

Specifies how encoding and decoding errors are to be handled. See the errors argument for open() for a full list of options.

See also

cudf.io.hdf.read_hdf

Read from HDF file.

cudf.io.parquet.to_parquet

Write a DataFrame to the binary parquet format.

cudf.io.feather.to_feather

Write out feather-format for DataFrames.

to_json(path_or_buf=None, *args, **kwargs)

Convert the cuDF object to a JSON string. Note nulls and NaNs will be converted to null and datetime objects will be converted to UNIX timestamps.

Parameters
path_or_bufstring or file handle, optional

File path or object. If not specified, the result is returned as a string.

orientstring

Indication of expected JSON string format.

  • Series
    • default is ‘index’

    • allowed values are: {‘split’,’records’,’index’,’table’}

  • DataFrame
    • default is ‘columns’

    • allowed values are: {‘split’,’records’,’index’,’columns’,’values’,’table’}

  • The format of the JSON string
    • ‘split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}

    • ‘records’ : list like [{column -> value}, … , {column -> value}]

    • ‘index’ : dict like {index -> {column -> value}}

    • ‘columns’ : dict like {column -> {index -> value}}

    • ‘values’ : just the values array

    • ‘table’ : dict like {‘schema’: {schema}, ‘data’: {data}} describing the data, and the data component is like orient='records'.

date_format{None, ‘epoch’, ‘iso’}

Type of date conversion. ‘epoch’ = epoch milliseconds, ‘iso’ = ISO8601. The default depends on the orient. For orient='table', the default is ‘iso’. For all other orients, the default is ‘epoch’.

double_precisionint, default 10

The number of decimal places to use when encoding floating point values.

force_asciibool, default True

Force encoded string to be ASCII.

date_unitstring, default ‘ms’ (milliseconds)

The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’, ‘ns’ for second, millisecond, microsecond, and nanosecond respectively.

default_handlercallable, default None

Handler to call if object cannot otherwise be converted to a suitable format for JSON. Should receive a single argument which is the object to convert and return a serializable object.

linesbool, default False

If ‘orient’ is ‘records’ write out line delimited json format. Will throw ValueError if incorrect ‘orient’ since others are not list like.

compression{‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}

A string representing the compression to use in the output file, only used when the first argument is a filename. By default, the compression is inferred from the filename.

indexbool, default True

Whether to include the index values in the JSON string. Not including the index (index=False) is only supported when orient is ‘split’ or ‘table’.

to_orc(fname, compression=None, *args, **kwargs)

Write a DataFrame to the ORC format.

Parameters
fnamestr

File path or object where the ORC dataset will be stored.

compression{{ ‘snappy’, None }}, default None

Name of the compression to use. Use None for no compression.

enable_statistics: boolean, default True

Enable writing column statistics.

to_pandas(nullable=False, **kwargs)

Convert to a Pandas DataFrame.

Parameters
nullableBoolean, Default False

If nullable is True, the resulting columns in the dataframe will be having a corresponding nullable Pandas dtype. If nullable is False, the resulting columns will either convert null values to np.nan or None depending on the dtype.

Returns
outPandas DataFrame

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [0, 1, 2], 'b': [-3, 2, 0]})
>>> pdf = df.to_pandas()
>>> pdf
   a  b
0  0 -3
1  1  2
2  2  0
>>> type(pdf)
<class 'pandas.core.frame.DataFrame'>

nullable parameter can be used to control whether dtype can be Pandas Nullable or not:

>>> df = cudf.DataFrame({'a': [0, None, 2], 'b': [True, False, None]})
>>> df
      a      b
0     0   True
1  <NA>  False
2     2   <NA>
>>> pdf = df.to_pandas(nullable=True)
>>> pdf
      a      b
0     0   True
1  <NA>  False
2     2   <NA>
>>> pdf.dtypes
a      Int64
b    boolean
dtype: object
>>> pdf = df.to_pandas(nullable=False)
>>> pdf
    a      b
0  0.0   True
1  NaN  False
2  2.0   None
>>> pdf.dtypes
a    float64
b     object
dtype: object
to_parquet(path, *args, **kwargs)

Write a DataFrame to the parquet format.

Parameters
pathstr

File path or Root Directory path. Will be used as Root Directory path while writing a partitioned dataset.

compression{‘snappy’, None}, default ‘snappy’

Name of the compression to use. Use None for no compression.

indexbool, default None

If True, include the dataframe’s index(es) in the file output. If False, they will not be written to the file. If None, the engine’s default behavior will be used. However, instead of being saved as values, the RangeIndex will be stored as a range in the metadata so it doesn’t require much space and is faster. Other indexes will be included as columns in the file output.

partition_colslist, optional, default None

Column names by which to partition the dataset Columns are partitioned in the order they are given

partition_file_namestr, optional, default None

File name to use for partitioned datasets. Different partitions will be written to different directories, but all files will have this name. If nothing is specified, a random uuid4 hex string will be used for each file.

int96_timestampsbool, default False

If True, write timestamps in int96 format. This will convert timestamps from timestamp[ns], timestamp[ms], timestamp[s], and timestamp[us] to the int96 format, which is the number of Julian days and the number of nanoseconds since midnight. If False, timestamps will not be altered.

to_records(index=True)

Convert to a numpy recarray

Parameters
indexbool

Whether to include the index in the output.

Returns
numpy recarray
to_string()

Convert to string

cuDF uses Pandas internals for efficient string formatting. Set formatting options using pandas string formatting options and cuDF objects will print identically to Pandas objects.

cuDF supports null/None as a value in any column type, which is transparently supported during this output process.

Examples

>>> import cudf
>>> df = cudf.DataFrame()
>>> df['key'] = [0, 1, 2]
>>> df['val'] = [float(i + 10) for i in range(3)]
>>> df.to_string()
'   key   val\n0    0  10.0\n1    1  11.0\n2    2  12.0'
transpose()

Transpose index and columns.

Returns
a new (ncol x nrow) dataframe. self is (nrow x ncol)

Notes

Difference from pandas: Not supporting copy because default and only behavior is copy=True

truediv(other, axis='columns', level=None, fill_value=None)

Get Floating division of dataframe and other, element-wise (binary operator truediv).

Equivalent to dataframe / other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
otherscalar, sequence, Series, or DataFrame

Any single or multiple element data structure, or list-like object.

fill_valuefloat or None, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
DataFrame

Result of the arithmetic operation.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df.truediv(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df / 10
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
unstack(level=- 1, fill_value=None)

Pivot one or more levels of the (necessarily hierarchical) index labels.

Pivots the specified levels of the index labels of df to the innermost levels of the columns labels of the result.

  • If the index of df has multiple levels, returns a Dataframe with specified level of the index pivoted to the column levels.

  • If the index of df has single level, returns a Series with all column levels pivoted to the index levels.

Parameters
dfDataFrame
levellevel name or index, list-like

Integer, name or list of such, specifying one or more levels of the index to pivot

fill_value

Non-functional argument provided for compatibility with Pandas.

Returns
Series or DataFrame

Examples

>>> df['a'] = [1, 1, 1, 2, 2]
>>> df['b'] = [1, 2, 3, 1, 2]
>>> df['c'] = [5, 6, 7, 8, 9]
>>> df['d'] = ['a', 'b', 'a', 'd', 'e']
>>> df = df.set_index(['a', 'b', 'd'])
>>> df
       c
a b d
1 1 a  5
  2 b  6
  3 a  7
2 1 d  8
  2 e  9

Unstacking level ‘a’:

>>> df.unstack('a')
        c
a       1     2
b d
1 a     5  <NA>
  d  <NA>     8
2 b     6  <NA>
  e  <NA>     9
3 a     7  <NA>

Unstacking level ‘d’ :

>>> df.unstack('d')
        c
d       a     b     d     e
a b
1 1     5  <NA>  <NA>  <NA>
  2  <NA>     6  <NA>  <NA>
  3     7  <NA>  <NA>  <NA>
2 1  <NA>  <NA>     8  <NA>
  2  <NA>  <NA>  <NA>     9

Unstacking multiple levels:

>>> df.unstack(['b', 'd'])
      c
b     1           2           3
d     a     d     b     e     a
a
1     5  <NA>     6  <NA>     7
2  <NA>     8  <NA>     9  <NA>

Unstacking single level index dataframe:

>>> df = cudf.DataFrame({('c', 1): [1, 2, 3], ('c', 2):[9, 8, 7]})
>>> df.unstack()
c  1  0    1
      1    2
      2    3
   2  0    9
      1    8
      2    7
dtype: int64
update(other, join='left', overwrite=True, filter_func=None, errors='ignore')

Modify a DataFrame in place using non-NA values from another DataFrame.

Aligns on indices. There is no return value.

Parameters
otherDataFrame, or object coercible into a DataFrame

Should have at least one matching index/column label with the original DataFrame. If a Series is passed, its name attribute must be set, and that will be used as the column name to align with the original DataFrame.

join{‘left’}, default ‘left’

Only left join is implemented, keeping the index and columns of the original object.

overwrite{True, False}, default True

How to handle non-NA values for overlapping keys: True: overwrite original DataFrame’s values with values from other. False: only update values that are NA in the original DataFrame.

filter_funcNone

filter_func is not supported yet Return True for values that should be updated.S

errors{‘raise’, ‘ignore’}, default ‘ignore’

If ‘raise’, will raise a ValueError if the DataFrame and other both contain non-NA data in the same place.

Returns
Nonemethod directly changes calling object
Raises
ValueError
  • When errors = ‘raise’ and there’s overlapping non-NA data.

  • When errors is not either ‘ignore’ or ‘raise’

NotImplementedError
  • If join != ‘left’

property values

Return a CuPy representation of the DataFrame.

Only the values in the DataFrame will be returned, the axes labels will be removed.

Returns
out: cupy.ndarray

The values of the DataFrame.

var(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)

Return unbiased variance of the DataFrame.

Normalized by N-1 by default. This can be changed using the ddof argument

Parameters
axis: {index (0), columns(1)}

Axis for the function to be applied on.

skipna: bool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

ddof: int, default 1

Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

Returns
scalar

Notes

Parameters currently not supported are level and numeric_only

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]})
>>> df.var()
a    1.666667
b    1.666667
dtype: float64
where(cond, other=None, inplace=False)

Replace values where the condition is False.

Parameters
condbool Series/DataFrame, array-like

Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.

other: scalar, list of scalars, Series/DataFrame

Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.

DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.

Series expects only scalar or series like with same length

inplacebool, default False

Whether to perform the operation in place on the data.

Returns
Same type as caller

Examples

>>> import cudf
>>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]})
>>> df.where(df % 2 == 0, [-1, -1])
   A  B
0 -1 -1
1  4 -1
2 -1  8
>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.where(ser > 2, 10)
0     4
1     3
2    10
3    10
4    10
dtype: int64
>>> ser.where(ser > 2)
0       4
1       3
2    <NA>
3    <NA>
4    <NA>
dtype: int64

Series

class cudf.core.series.Series(data=None, index=None, dtype=None, name=None, nan_as_null=True)

One-dimensional GPU array (including time series).

Labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Statistical methods from ndarray have been overridden to automatically exclude missing data (currently represented as null/NaN).

Operations between Series (+, -, /, *, **) align values based on their associated index values-– they need not be the same length. The result index will be the sorted union of the two indexes.

Series objects are used as columns of DataFrame.

Parameters
dataarray-like, Iterable, dict, or scalar value

Contains data stored in Series.

indexarray-like or Index (1d)

Values must be hashable and have the same length as data. Non-unique index values are allowed. Will default to RangeIndex (0, 1, 2, …, n) if not provided. If both a dict and index sequence are used, the index will override the keys found in the dict.

dtypestr, numpy.dtype, or ExtensionDtype, optional

Data type for the output Series. If not specified, this will be inferred from data.

namestr, optional

The name to give to the Series.

nan_as_nullbool, Default True

If None/True, converts np.nan values to null values. If False, leaves np.nan values as is.

Attributes
cat

Accessor object for categorical properties of the Series values.

data

The gpu buffer for the data

dt

Accessor object for datetimelike properties of the Series values.

dtype

dtype of the Series

empty

Indicator whether DataFrame or Series is empty.

has_nulls

Indicator whether Series contains null values.

iloc

Select values by position.

index

The index object

is_monotonic

Return boolean if values in the object are monotonic_increasing.

is_monotonic_decreasing

Return boolean if values in the object are monotonic_decreasing.

is_monotonic_increasing

Return boolean if values in the object are monotonic_increasing.

is_unique

Return boolean if values in the object are unique.

list
loc

Select values by label.

name

Returns name of the Series.

ndim

Dimension of the data.

null_count

Number of null values

nullable

A boolean indicating whether a null-mask is needed

nullmask

The gpu buffer for the null-mask

shape

Returns a tuple representing the dimensionality of the Series.

size

Return the number of elements in the underlying data.

str

Vectorized string functions for Series and Index.

valid_count

Number of non-null values

values

Return a CuPy representation of the Series.

values_host

Return a numpy representation of the Series.

Methods

abs()

Absolute value of each element of the series.

acos()

Get Trigonometric inverse cosine, element-wise.

add(other[, fill_value, axis])

Addition of series and other, element-wise (binary operator add).

all([axis, bool_only, skipna, level])

Return whether all elements are True in Series.

any([axis, bool_only, skipna, level])

Return whether any elements is True in Series.

append(to_append[, ignore_index, …])

Append values from another Series or array-like object.

applymap(udf[, out_dtype])

Apply an elementwise function to transform the values in the Column.

argsort([ascending, na_position])

Returns a Series of int64 index that will sort the series.

as_index()

Returns a new Series with a RangeIndex.

as_mask()

Convert booleans to bitmask

asin()

Get Trigonometric inverse sine, element-wise.

astype(dtype[, copy, errors])

Cast the Series to the given dtype

atan()

Get Trigonometric inverse tangent, element-wise.

ceil()

Rounds each value upward to the smallest integral value not less than the original.

clip([lower, upper, inplace, axis])

Trim values at input threshold(s).

copy([deep])

Make a copy of this object’s indices and data.

corr(other[, method, min_periods])

Calculates the sample correlation between two Series, excluding missing values.

cos()

Get Trigonometric cosine, element-wise.

count([level])

Return number of non-NA/null observations in the Series

cov(other[, min_periods])

Compute covariance with Series, excluding missing values.

cummax([axis, skipna])

Return cumulative maximum of the Series.

cummin([axis, skipna])

Return cumulative minimum of the Series.

cumprod([axis, skipna])

Return cumulative product of the Series.

cumsum([axis, skipna])

Return cumulative sum of the Series.

describe([percentiles, include, exclude, …])

Generate descriptive statistics.

diff([periods])

Calculate the difference between values at positions i and i - N in an array and store the output in a new array.

digitize(bins[, right])

Return the indices of the bins to which each value in series belongs.

drop([labels, axis, index, columns, level, …])

Return Series with specified index labels removed.

drop_duplicates([keep, inplace, ignore_index])

Return Series with duplicate values removed.

dropna([axis, inplace, how])

Return a Series with null values removed.

eq(other[, fill_value, axis])

Equal to of series and other, element-wise (binary operator eq).

equals(other, **kwargs)

Test whether two objects contain the same elements.

exp()

Get the exponential of all elements, element-wise.

explode([ignore_index])

Transform each element of a list-like to a row, replicating index values.

factorize([na_sentinel])

Encode the input values as integer labels

fillna([value, method, axis, inplace, limit])

Fill null values with value or specified method.

floor()

Rounds each value downward to the largest integral value not greater than the original.

floordiv(other[, fill_value, axis])

Integer division of series and other, element-wise (binary operator floordiv).

from_arrow(array)

Convert from PyArrow Array/ChunkedArray to Series.

from_categorical(categorical[, codes])

Creates from a pandas.Categorical

from_masked_array(data, mask[, null_count])

Create a Series with null-mask.

from_pandas(s[, nan_as_null])

Convert from a Pandas Series.

ge(other[, fill_value, axis])

Greater than or equal to of series and other, element-wise (binary operator ge).

groupby([by, axis, level, as_index, sort, …])

Group Series using a mapper or by a Series of columns.

gt(other[, fill_value, axis])

Greater than of series and other, element-wise (binary operator gt).

hash_encode(stop[, use_name])

Encode column values as ints in [0, stop) using hash function.

hash_values()

Compute the hash of values in this column.

head([n])

Return the first n rows.

interleave_columns()

Interleave Series columns of a table into a single column.

isin(values)

Check whether values are contained in Series.

isna()

Identify missing values.

isnull()

Identify missing values.

keys()

Return alias for index.

kurt([axis, skipna, level, numeric_only])

Return Fisher’s unbiased kurtosis of a sample.

kurtosis([axis, skipna, level, numeric_only])

Return Fisher’s unbiased kurtosis of a sample.

label_encoding(cats[, dtype, na_sentinel])

Perform label encoding

le(other[, fill_value, axis])

Less than or equal to of series and other, element-wise (binary operator le).

log()

Get the natural logarithm of all elements, element-wise.

lt(other[, fill_value, axis])

Less than of series and other, element-wise (binary operator lt).

map(arg[, na_action])

Map values of Series according to input correspondence.

mask(cond[, other, inplace])

Replace values where the condition is True.

max([axis, skipna, dtype, level, numeric_only])

Return the maximum of the values in the Series.

mean([axis, skipna, level, numeric_only])

Return the mean of the values in the series.

median([axis, skipna, level, numeric_only])

Return the median of the values for the requested axis.

memory_usage([index, deep])

Return the memory usage of the Series.

min([axis, skipna, dtype, level, numeric_only])

Return the minimum of the values in the Series.

mod(other[, fill_value, axis])

Modulo of series and other, element-wise (binary operator mod).

mode([dropna])

Return the mode(s) of the dataset.

mul(other[, fill_value, axis])

Multiplication of series and other, element-wise (binary operator mul).

multiply(other[, fill_value, axis])

Multiplication of series and other, element-wise (binary operator mul).

nans_to_nulls()

Convert nans (if any) to nulls

ne(other[, fill_value, axis])

Not equal to of series and other, element-wise (binary operator ne).

nlargest([n, keep])

Returns a new Series of the n largest element.

notna()

Identify non-missing values.

notnull()

Identify non-missing values.

nsmallest([n, keep])

Returns a new Series of the n smallest element.

nunique([method, dropna])

Returns the number of unique values of the Series: approximate version, and exact version to be moved to libgdf

one_hot_encoding(cats[, dtype])

Perform one-hot-encoding

pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

pow(other[, fill_value, axis])

Exponential power of series and other, element-wise (binary operator pow).

prod([axis, skipna, dtype, level, …])

Return product of the values in the series

product([axis, skipna, dtype, level, …])

Return product of the values in the Series.

quantile([q, interpolation, exact, quant_index])

Return values at the given quantile.

radd(other[, fill_value, axis])

Addition of series and other, element-wise (binary operator radd).

rank([axis, method, numeric_only, …])

Compute numerical data ranks (1 through n) along axis.

reindex([index, copy])

Return a Series that conforms to a new index

rename([index, copy])

Alter Series name

repeat(repeats[, axis])

Repeats elements consecutively.

replace([to_replace, value, inplace, limit, …])

Replace values given in to_replace with value.

reset_index([drop, inplace])

Reset index to RangeIndex

reverse()

Reverse the Series

rfloordiv(other[, fill_value, axis])

Integer division of series and other, element-wise (binary operator rfloordiv).

rmod(other[, fill_value, axis])

Modulo of series and other, element-wise (binary operator rmod).

rmul(other[, fill_value, axis])

Multiplication of series and other, element-wise (binary operator rmul).

rolling(window[, min_periods, center, axis, …])

Rolling window calculations.

round([decimals])

Round each value in a Series to the given number of decimals.

rpow(other[, fill_value, axis])

Exponential power of series and other, element-wise (binary operator rpow).

rsub(other[, fill_value, axis])

Subtraction of series and other, element-wise (binary operator rsub).

rtruediv(other[, fill_value, axis])

Floating division of series and other, element-wise (binary operator rtruediv).

sample([n, frac, replace, weights, …])

Return a random sample of items from an axis of object.

scale()

Scale values to [0, 1] in float64

scatter_by_map(map_index[, map_size, keep_index])

Scatter to a list of dataframes.

searchsorted(values[, side, ascending, …])

Find indices where elements should be inserted to maintain order

set_index(index)

Returns a new Series with a different index.

set_mask(mask[, null_count])

Create new Series by setting a mask array.

shift([periods, freq, axis, fill_value])

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

skew([axis, skipna, level, numeric_only])

Return unbiased Fisher-Pearson skew of a sample.

sort_index([ascending])

Sort by the index.

sort_values([axis, ascending, inplace, …])

Sort by the values.

sqrt()

Get the non-negative square-root of all elements, element-wise.

std([axis, skipna, level, ddof, numeric_only])

Return sample standard deviation of the Series.

sub(other[, fill_value, axis])

Subtraction of series and other, element-wise (binary operator sub).

subtract(other[, fill_value, axis])

Subtraction of series and other, element-wise (binary operator sub).

sum([axis, skipna, dtype, level, …])

Return sum of the values in the Series.

tail([n])

Returns the last n rows as a new Series

take(indices[, keep_index])

Return Series by taking values from the corresponding indices.

tan()

Get Trigonometric tangent, element-wise.

tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

to_array([fillna])

Get a dense numpy array for the data.

to_arrow()

Convert Series to a PyArrow Array.

to_dlpack()

Converts a cuDF object into a DLPack tensor.

to_frame([name])

Convert Series into a DataFrame

to_gpu_array([fillna])

Get a dense numba device array for the data.

to_hdf(path_or_buf, key, *args, **kwargs)

Write the contained data to an HDF5 file using HDFStore.

to_json([path_or_buf])

Convert the cuDF object to a JSON string.

to_pandas([index, nullable])

Convert to a Pandas Series.

to_string()

Convert to string

truediv(other[, fill_value, axis])

Floating division of series and other, element-wise (binary operator truediv).

unique()

Returns unique values of this Series.

update(other)

Modify Series in place using values from passed Series.

value_counts([normalize, sort, ascending, …])

Return a Series containing counts of unique values.

var([axis, skipna, level, ddof, numeric_only])

Return unbiased variance of the Series.

where(cond[, other, inplace])

Replace values where the condition is False.

abs()

Absolute value of each element of the series.

Returns
abs

Series containing the absolute value of each element.

Examples

>>> import cudf
>>> series = cudf.Series([-1.10, 2, -3.33, 4])
>>> series
0   -1.10
1    2.00
2   -3.33
3    4.00
dtype: float64
>>> series.abs()
0    1.10
1    2.00
2    3.33
3    4.00
dtype: float64
acos()

Get Trigonometric inverse cosine, element-wise.

The inverse of cos so that, if y = x.cos(), then x = y.acos()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.acos()
0    3.141593
1    1.570796
2    0.000000
3    1.240482
4    1.047198
dtype: float64

acos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.acos()
      first    second
0  3.141593  1.334606
1  1.570796  1.266104
2  1.047198  1.470629

acos operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.acos()
Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0,
            1.5707963267948966,  1.266103672779499],
            dtype='float64')
add(other, fill_value=None, axis=0)

Addition of series and other, element-wise (binary operator add).

Parameters
otherSeries or scalar value
fill_valueNone or value

Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null

Returns
Series

The result of the addition.

Examples

>>> import cudf
>>> a = cudf.Series([1, 1, 1, None], index=['a', 'b', 'c', 'd'])
>>> a
a       1
b       1
c       1
d    <NA>
dtype: int64
>>> b = cudf.Series([1, None, 1, None], index=['a', 'b', 'd', 'e'])
>>> b
a       1
b    <NA>
d       1
e    <NA>
dtype: int64
>>> a.add(b)
a       2
b    <NA>
c    <NA>
d    <NA>
e    <NA>
dtype: int64
>>> a.add(b, fill_value=0)
a       2
b       1
c       1
d       1
e    <NA>
dtype: int64
all(axis=0, bool_only=None, skipna=True, level=None, **kwargs)

Return whether all elements are True in Series.

Parameters
skipnabool, default True

Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be True, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.

Returns
scalar

Notes

Parameters currently not supported are axis, bool_only, level.

Examples

>>> import cudf
>>> ser = cudf.Series([1, 5, 2, 4, 3])
>>> ser.all()
True
any(axis=0, bool_only=None, skipna=True, level=None, **kwargs)

Return whether any elements is True in Series.

Parameters
skipnabool, default True

Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be False, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.

Returns
scalar

Notes

Parameters currently not supported are axis, bool_only, level.

Examples

>>> import cudf
>>> ser = cudf.Series([1, 5, 2, 4, 3])
>>> ser.any()
True
append(to_append, ignore_index=False, verify_integrity=False)

Append values from another Series or array-like object. If ignore_index=True, the index is reset.

Parameters
to_appendSeries or list/tuple of Series

Series to append with self.

ignore_indexboolean, default False.

If True, do not use the index.

verify_integritybool, default False

This Parameter is currently not supported.

Returns
Series

A new concatenated series

See also

cudf.core.reshape.concat

General function to concatenate DataFrame or Series objects.

Examples

>>> import cudf
>>> s1 = cudf.Series([1, 2, 3])
>>> s2 = cudf.Series([4, 5, 6])
>>> s1
0    1
1    2
2    3
dtype: int64
>>> s2
0    4
1    5
2    6
dtype: int64
>>> s1.append(s2)
0    1
1    2
2    3
0    4
1    5
2    6
dtype: int64
>>> s3 = cudf.Series([4, 5, 6], index=[3, 4, 5])
>>> s3
3    4
4    5
5    6
dtype: int64
>>> s1.append(s3)
0    1
1    2
2    3
3    4
4    5
5    6
dtype: int64

With ignore_index set to True:

>>> s1.append(s2, ignore_index=True)
0    1
1    2
2    3
3    4
4    5
5    6
dtype: int64
applymap(udf, out_dtype=None)

Apply an elementwise function to transform the values in the Column.

The user function is expected to take one argument and return the result, which will be stored to the output Series. The function cannot reference globals except for other simple scalar objects.

Parameters
udffunction

Either a callable python function or a python function already decorated by numba.cuda.jit for call on the GPU as a device

out_dtypenumpy.dtype; optional

The dtype for use in the output. Only used for numba.cuda.jit decorated udf. By default, the result will have the same dtype as the source.

Returns
resultSeries

The mask and index are preserved.

Notes

The supported Python features are listed in

with these exceptions:

  • Math functions in cmath are not supported since libcudf does not have complex number support and output of cmath functions are most likely complex numbers.

  • These five functions in math are not supported since numba generates multiple PTX functions from them

    • math.sin()

    • math.cos()

    • math.tan()

    • math.gamma()

    • math.lgamma()

  • Series with string dtypes are not supported in applymap method.

  • Global variables need to be re-defined explicitly inside the udf, as numba considers them to be compile-time constants and there is no known way to obtain value of the global variable.

Examples

Returning a Series of booleans using only a literal pattern.

>>> import cudf
>>> s = cudf.Series([1, 10, -10, 200, 100])
>>> s.applymap(lambda x: x)
0      1
1     10
2    -10
3    200
4    100
dtype: int64
>>> s.applymap(lambda x: x in [1, 100, 59])
0     True
1    False
2    False
3    False
4     True
dtype: bool
>>> s.applymap(lambda x: x ** 2)
0        1
1      100
2      100
3    40000
4    10000
dtype: int64
>>> s.applymap(lambda x: (x ** 2) + (x / 2))
0        1.5
1      105.0
2       95.0
3    40100.0
4    10050.0
dtype: float64
>>> def cube_function(a):
...     return a ** 3
...
>>> s.applymap(cube_function)
0          1
1       1000
2      -1000
3    8000000
4    1000000
dtype: int64
>>> def custom_udf(x):
...     if x > 0:
...         return x + 5
...     else:
...         return x - 5
...
>>> s.applymap(custom_udf)
0      6
1     15
2    -15
3    205
4    105
dtype: int64
argsort(ascending=True, na_position='last')

Returns a Series of int64 index that will sort the series.

Uses Thrust sort.

Returns
result: Series

Examples

>>> import cudf
>>> s = cudf.Series([3, 1, 2])
>>> s
0    3
1    1
2    2
dtype: int64
>>> s.argsort()
0    1
1    2
2    0
dtype: int32
>>> s[s.argsort()]
1    1
2    2
0    3
dtype: int64
as_index()

Returns a new Series with a RangeIndex.

Examples

>>> s = cudf.Series([1,2,3], index=['a','b','c'])
>>> s
a    1
b    2
c    3
dtype: int64
>>> s.as_index()
0    1
1    2
2    3
dtype: int64
as_mask()

Convert booleans to bitmask

Returns
device array

Examples

>>> import cudf
>>> s = cudf.Series([True, False, True])
>>> s.as_mask()
<cudf.core.buffer.Buffer object at 0x7f23c3eed0d0>
>>> s.as_mask().to_host_array()
array([  5,   0,   0,   0,   0,   0,   0,   0,   1,   0,   0,   0,   0,
         0,   0,   0,   2,   0,   0,   0,   0,   0,   0,   0, 181, 164,
       188,   1,   0,   0,   0,   0, 255, 255, 255, 255, 255, 255, 255,
       127, 253, 214,  62, 241,   1,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
     dtype=uint8)
asin()

Get Trigonometric inverse sine, element-wise.

The inverse of sine so that, if y = x.sin(), then x = y.asin()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.asin()
0   -1.570796
1    0.000000
2    1.570796
3    0.330314
4    0.523599
dtype: float64

asin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.asin()
      first    second
0 -1.570796  0.236190
1  0.000000  0.304693
2  0.523599  0.100167

asin operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64')
>>> index.asin()
Float64Index([-1.5707963267948966, 0.41151684606748806,
            1.5707963267948966, 0.3046926540153975],
            dtype='float64')
astype(dtype, copy=False, errors='raise')

Cast the Series to the given dtype

Parameters
dtypedata type, or dict of column name -> data type

Use a numpy.dtype or Python type to cast Series object to the same type. Alternatively, use {col: dtype, …}, where col is a series name and dtype is a numpy.dtype or Python type to cast to.

copybool, default False

Return a deep-copy when copy=True. Note by default copy=False setting is used and hence changes to values then may propagate to other cudf objects.

errors{‘raise’, ‘ignore’, ‘warn’}, default ‘raise’

Control raising of exceptions on invalid data for provided dtype.

  • raise : allow exceptions to be raised

  • ignore : suppress exceptions. On error return original object.

  • warn : prints last exceptions as warnings and return original object.

Returns
outSeries

Returns self.copy(deep=copy) if dtype is the same as self.dtype.

Examples

>>> import cudf
>>> series = cudf.Series([1, 2], dtype='int32')
>>> series
0    1
1    2
dtype: int32
>>> series.astype('int64')
0    1
1    2
dtype: int64

Convert to categorical type:

>>> series.astype('category')
0    1
1    2
dtype: category
Categories (2, int64): [1, 2]

Convert to ordered categorical type with custom ordering:

>>> cat_dtype = cudf.CategoricalDtype(categories=[2, 1], ordered=True)
>>> series.astype(cat_dtype)
0    1
1    2
dtype: category
Categories (2, int64): [2 < 1]

Note that using copy=False (enabled by default) and changing data on a new Series will propagate changes:

>>> s1 = cudf.Series([1, 2])
>>> s1
0    1
1    2
dtype: int64
>>> s2 = s1.astype('int64', copy=False)
>>> s2[0] = 10
>>> s1
0    10
1     2
dtype: int64
atan()

Get Trigonometric inverse tangent, element-wise.

The inverse of tan so that, if y = x.tan(), then x = y.atan()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10])
>>> ser
0    -1.00000
1     0.00000
2     1.00000
3     0.32434
4     0.50000
5   -10.00000
dtype: float64
>>> ser.atan()
0   -0.785398
1    0.000000
2    0.785398
3    0.313635
4    0.463648
5   -1.471128
dtype: float64

atan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.atan()
      first    second
0 -0.785398  0.229864
1 -1.471128  0.291457
2  0.463648  1.471128

atan operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.atan()
Float64Index([-0.7853981633974483,  0.3805063771123649,
                            0.7853981633974483, 0.0,
                            0.2914567944778671],
            dtype='float64')
property cat

Accessor object for categorical properties of the Series values. Be aware that assigning to categories is a inplace operation, while all methods return new categorical data per default.

Parameters
columnColumn
parentSeries or CategoricalIndex

Examples

>>> s = cudf.Series([1,2,3], dtype='category')
>>> s
>>> s
0    1
1    2
2    3
dtype: category
Categories (3, int64): [1, 2, 3]
>>> s.cat.categories
Int64Index([1, 2, 3], dtype='int64')
>>> s.cat.reorder_categories([3,2,1])
0    1
1    2
2    3
dtype: category
Categories (3, int64): [3, 2, 1]
>>> s.cat.remove_categories([1])
0    <NA>
1       2
2       3
dtype: category
Categories (2, int64): [2, 3]
>>> s.cat.set_categories(list('abcde'))
0    <NA>
1    <NA>
2    <NA>
dtype: category
Categories (5, object): ['a', 'b', 'c', 'd', 'e']
>>> s.cat.as_ordered()
0    1
1    2
2    3
dtype: category
Categories (3, int64): [1 < 2 < 3]
>>> s.cat.as_unordered()
0    1
1    2
2    3
dtype: category
Categories (3, int64): [1, 2, 3]
ceil()

Rounds each value upward to the smallest integral value not less than the original.

Returns
res

Returns a new Series with ceiling value of each element.

Examples

>>> import cudf
>>> series = cudf.Series([1.1, 2.8, 3.5, 4.5])
>>> series
0    1.1
1    2.8
2    3.5
3    4.5
dtype: float64
>>> series.ceil()
0    2.0
1    3.0
2    4.0
3    5.0
dtype: float64
clip(lower=None, upper=None, inplace=False, axis=1)

Trim values at input threshold(s).

Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.

Parameters
lowerscalar or array_like, default None

Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.

upperscalar or array_like, default None

Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.

inplacebool, default False
Returns
Clipped DataFrame/Series/Index/MultiIndex

Examples

>>> import cudf
>>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']})
>>> df.clip(lower=[2, 'b'], upper=[3, 'c'])
   a  b
0  2  b
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=None, upper=[3, 'c'])
   a  b
0  1  a
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=[2, 'b'], upper=None)
   a  b
0  2  b
1  2  b
2  3  c
3  4  d
>>> df.clip(lower=2, upper=3, inplace=True)
>>> df
   a  b
0  2  2
1  2  3
2  3  3
3  3  3
>>> import cudf
>>> sr = cudf.Series([1, 2, 3, 4])
>>> sr.clip(lower=2, upper=3)
0    2
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=None, upper=3)
0    1
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True)
>>> sr
0    2
1    2
2    3
3    4
dtype: int64
copy(deep: bool = True)T

Make a copy of this object’s indices and data.

When deep=True (default), a new object will be created with a copy of the calling object’s data and indices. Modifications to the data or indices of the copy will not be reflected in the original object (see notes below). When deep=False, a new object will be created without copying the calling object’s data or index (only references to the data and index are copied). Any changes to the data of the original will be reflected in the shallow copy (and vice versa).

Parameters
deepbool, default True

Make a deep copy, including a copy of the data and the indices. With deep=False neither the indices nor the data are copied.

Returns
copySeries or DataFrame

Object type matches caller.

Examples

>>> s = cudf.Series([1, 2], index=["a", "b"])
>>> s
a    1
b    2
dtype: int64
>>> s_copy = s.copy()
>>> s_copy
a    1
b    2
dtype: int64

Shallow copy versus default (deep) copy:

>>> s = cudf.Series([1, 2], index=["a", "b"])
>>> deep = s.copy()
>>> shallow = s.copy(deep=False)

Shallow copy shares data and index with original.

>>> s is shallow
False
>>> s._column is shallow._column and s.index is shallow.index
True

Deep copy has own copy of data and index.

>>> s is deep
False
>>> s.values is deep.values or s.index is deep.index
False

Updates to the data shared by shallow copy and original is reflected in both; deep copy remains unchanged.

>>> s['a'] = 3
>>> shallow['b'] = 4
>>> s
a    3
b    4
dtype: int64
>>> shallow
a    3
b    4
dtype: int64
>>> deep
a    1
b    2
dtype: int64
corr(other, method='pearson', min_periods=None)

Calculates the sample correlation between two Series, excluding missing values.

Examples

>>> import cudf
>>> ser1 = cudf.Series([0.9, 0.13, 0.62])
>>> ser2 = cudf.Series([0.12, 0.26, 0.51])
>>> ser1.corr(ser2)
-0.20454263717316112
cos()

Get Trigonometric cosine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.cos()
0    1.000000
1    0.947861
2    0.877583
3    0.525322
4   -0.448074
5   -0.598460
6   -0.283691
dtype: float64

cos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.cos()
      first    second
0  1.000000  0.862319
1  0.283662 -0.283691
2 -0.839072 -0.839039
3 -0.759688 -0.022097

cos operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.cos()
Float64Index([ 0.9210609940028851,  0.8623188722876839,
            -0.5984600690578581, -0.4480736161291701],
            dtype='float64')
count(level=None, **kwargs)

Return number of non-NA/null observations in the Series

Returns
int

Number of non-null values in the Series.

Notes

Parameters currently not supported is level.

Examples

>>> import cudf
>>> ser = cudf.Series([1, 5, 2, 4, 3])
>>> ser.count()
5
cov(other, min_periods=None)

Compute covariance with Series, excluding missing values.

Parameters
otherSeries

Series with which to compute the covariance.

Returns
float

Covariance between Series and other normalized by N-1 (unbiased estimator).

Notes

min_periods parameter is not yet supported.

Examples

>>> import cudf
>>> ser1 = cudf.Series([0.9, 0.13, 0.62])
>>> ser2 = cudf.Series([0.12, 0.26, 0.51])
>>> ser1.cov(ser2)
-0.015750000000000004
cummax(axis=0, skipna=True, *args, **kwargs)

Return cumulative maximum of the Series.

Parameters
skipnabool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

Returns
Series

Notes

Parameters currently not supported is axis

Examples

>>> import cudf
>>> ser = cudf.Series([1, 5, 2, 4, 3])
>>> ser.cummax()
0    1
1    5
2    5
3    5
4    5
cummin(axis=None, skipna=True, *args, **kwargs)

Return cumulative minimum of the Series.

Parameters
skipnabool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

Returns
Series

Notes

Parameters currently not supported is axis

Examples

>>> import cudf
>>> ser = cudf.Series([1, 5, 2, 4, 3])
>>> ser.cummin()
0    1
1    1
2    1
3    1
4    1
cumprod(axis=0, skipna=True, *args, **kwargs)

Return cumulative product of the Series.

Parameters
skipnabool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

Returns
Series

Notes

Parameters currently not supported is axis

Examples

>>> import cudf
>>> ser = cudf.Series([1, 5, 2, 4, 3])
>>> ser.cumprod()
0    1
1    5
2    10
3    40
4    120
cumsum(axis=0, skipna=True, *args, **kwargs)

Return cumulative sum of the Series.

Parameters
skipnabool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

Returns
Series

Notes

Parameters currently not supported is axis

Examples

>>> import cudf
>>> ser = cudf.Series([1, 5, 2, 4, 3])
>>> ser.cumsum()
0    1
1    6
2    8
3    12
4    15
property data

The gpu buffer for the data

Returns
outThe GPU buffer of the Series.

Examples

>>> import cudf
>>> series = cudf.Series([1, 2, 3, 4])
>>> series
0    1
1    2
2    3
3    4
dtype: int64
>>> series.data
<cudf.core.buffer.Buffer object at 0x7f23c192d110>
>>> series.data.to_host_array()
array([1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0,
       0, 0, 4, 0, 0, 0, 0, 0, 0, 0], dtype=uint8)
describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)

Generate descriptive statistics.

Descriptive statistics include those that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.

Analyzes both numeric and object series, as well as DataFrame column sets of mixed data types. The output will vary depending on what is provided. Refer to the notes below for more detail.

Parameters
percentileslist-like of numbers, optional

The percentiles to include in the output. All should fall between 0 and 1. The default is [.25, .5, .75], which returns the 25th, 50th, and 75th percentiles.

include‘all’, list-like of dtypes or None(default), optional

A list of data types to include in the result. Ignored for Series. Here are the options:

  • ‘all’ : All columns of the input will be included in the output.

  • A list-like of dtypes : Limits the results to the provided data types. To limit the result to numeric types submit numpy.number. To limit it instead to object columns submit the numpy.object data type. Strings can also be used in the style of select_dtypes (e.g. df.describe(include=['O'])). To select pandas categorical columns, use 'category'

  • None (default) : The result will include all numeric columns.

excludelist-like of dtypes or None (default), optional,

A list of data types to omit from the result. Ignored for Series. Here are the options:

  • A list-like of dtypes : Excludes the provided data types from the result. To exclude numeric types submit numpy.number. To exclude object columns submit the data type numpy.object. Strings can also be used in the style of select_dtypes (e.g. df.describe(include=['O'])). To exclude pandas categorical columns, use 'category'

  • None (default) : The result will exclude nothing.

datetime_is_numericbool, default False

For DataFrame input, this also controls whether datetime columns are included by default.

Returns
output_frameSeries or DataFrame

Summary statistics of the Series or Dataframe provided.

Notes

For numeric data, the result’s index will include count, mean, std, min, max as well as lower, 50 and upper percentiles. By default the lower percentile is 25 and the upper percentile is 75. The 50 percentile is the same as the median.

For strings dtype or datetime dtype, the result’s index will include count, unique, top, and freq. The top is the most common value. The freq is the most common value’s frequency. Timestamps also include the first and last items.

If multiple object values have the highest count, then the count and top results will be arbitrarily chosen from among those with the highest count.

For mixed data types provided via a DataFrame, the default is to return only an analysis of numeric columns. If the dataframe consists only of object and categorical data without any numeric columns, the default is to return an analysis of both the object and categorical columns. If include='all' is provided as an option, the result will include a union of attributes of each type.

The include and exclude parameters can be used to limit which columns in a DataFrame are analyzed for the output. The parameters are ignored when analyzing a Series.

Examples

Describing a Series containing numeric values.

>>> import cudf
>>> s = cudf.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> s
0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9    10
dtype: int64
>>> s.describe()
count    10.00000
mean      5.50000
std       3.02765
min       1.00000
25%       3.25000
50%       5.50000
75%       7.75000
max      10.00000
dtype: float64

Describing a categorical Series.

>>> s = cudf.Series(['a', 'b', 'a', 'b', 'c', 'a'], dtype='category')
>>> s
0    a
1    b
2    a
3    b
4    c
5    a
dtype: category
Categories (3, object): ['a', 'b', 'c']
>>> s.describe()
count     6
unique    3
top       a
freq      3
dtype: object

Describing a timestamp Series.

>>> import numpy as np
>>> s = cudf.Series([
...   np.datetime64("2000-01-01"),
...   np.datetime64("2010-01-01"),
...   np.datetime64("2010-01-01")
... ])
>>> s
0   2000-01-01
1   2010-01-01
2   2010-01-01
dtype: datetime64[s]
>>> s.describe()
count                                3
mean     2006-09-01 08:00:00.000000000
min      2000-01-01 00:00:00.000000000
25%      2004-12-31 12:00:00.000000000
50%      2010-01-01 00:00:00.000000000
75%      2010-01-01 00:00:00.000000000
max      2010-01-01 00:00:00.000000000
dtype: object

Describing a DataFrame. By default only numeric fields are returned.

>>> df = cudf.DataFrame({"categorical": cudf.Series(['d', 'e', 'f'],
...                         dtype='category'),
...                      "numeric": [1, 2, 3],
...                      "object": ['a', 'b', 'c']
... })
>>> df
  categorical  numeric object
0           d        1      a
1           e        2      b
2           f        3      c
>>> df.describe()
       numeric
count      3.0
mean       2.0
std        1.0
min        1.0
25%        1.5
50%        2.0
75%        2.5
max        3.0

Describing all columns of a DataFrame regardless of data type.

>>> df.describe(include='all')
       categorical numeric object
count            3     3.0      3
unique           3    <NA>      3
top              d    <NA>      a
freq             1    <NA>      1
mean          <NA>     2.0   <NA>
std           <NA>     1.0   <NA>
min           <NA>     1.0   <NA>
25%           <NA>     1.5   <NA>
50%           <NA>     2.0   <NA>
75%           <NA>     2.5   <NA>
max           <NA>     3.0   <NA>

Describing a column from a DataFrame by accessing it as an attribute.

>>> df.numeric.describe()
count    3.0
mean     2.0
std      1.0
min      1.0
25%      1.5
50%      2.0
75%      2.5
max      3.0
Name: numeric, dtype: float64

Including only numeric columns in a DataFrame description.

>>> df.describe(include=[np.number])
       numeric
count      3.0
mean       2.0
std        1.0
min        1.0
25%        1.5
50%        2.0
75%        2.5
max        3.0

Including only string columns in a DataFrame description.

>>> df.describe(include=[object])
       object
count       3
unique      3
top         a
freq        1

Including only categorical columns from a DataFrame description.

>>> df.describe(include=['category'])
       categorical
count            3
unique           3
top              d
freq             1

Excluding numeric columns from a DataFrame description.

>>> df.describe(exclude=[np.number])
       categorical object
count            3      3
unique           3      3
top              d      a
freq             1      1

Excluding object columns from a DataFrame description.

>>> df.describe(exclude=[object])
       categorical numeric
count            3     3.0
unique           3    <NA>
top              d    <NA>
freq             1    <NA>
mean          <NA>     2.0
std           <NA>     1.0
min           <NA>     1.0
25%           <NA>     1.5
50%           <NA>     2.0
75%           <NA>     2.5
max           <NA>     3.0
diff(periods=1)

Calculate the difference between values at positions i and i - N in an array and store the output in a new array.

Returns
Series

First differences of the Series.

Notes

Diff currently only supports float and integer dtype columns with no null values.

Examples

>>> import cudf
>>> series = cudf.Series([1, 1, 2, 3, 5, 8])
>>> series
0    1
1    1
2    2
3    3
4    5
5    8
dtype: int64

Difference with previous row

>>> series.diff()
0    <NA>
1       0
2       1
3       1
4       2
5       3
dtype: int64

Difference with 3rd previous row

>>> series.diff(periods=3)
0    <NA>
1    <NA>
2    <NA>
3       2
4       4
5       6
dtype: int64

Difference with following row

>>> series.diff(periods=-1)
0       0
1      -1
2      -1
3      -2
4      -3
5    <NA>
dtype: int64
digitize(bins, right=False)

Return the indices of the bins to which each value in series belongs.

Parameters
binsnp.array

1-D monotonically, increasing array with same type as this series.

rightbool

Indicates whether interval contains the right or left bin edge.

Returns
A new Series containing the indices.

Notes

Monotonicity of bins is assumed and not checked.

Examples

>>> import cudf
>>> s = cudf.Series([0.2, 6.4, 3.0, 1.6])
>>> bins = cudf.Series([0.0, 1.0, 2.5, 4.0, 10.0])
>>> inds = s.digitize(bins)
>>> inds
0    1
1    4
2    3
3    2
dtype: int32
drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

Return Series with specified index labels removed.

Remove elements of a Series based on specifying the index labels. When using a multi-index, labels on different levels can be removed by specifying the level.

Parameters
labelssingle label or list-like

Index labels to drop.

axis0, default 0

Redundant for application on Series.

indexsingle label or list-like

Redundant for application on Series. But index can be used instead of labels

columnssingle label or list-like

This parameter is ignored. Use index or labels to specify.

levelint or level name, optional

For MultiIndex, level from which the labels will be removed.

inplacebool, default False

If False, return a copy. Otherwise, do operation inplace and return None.

errors{‘ignore’, ‘raise’}, default ‘raise’

If ‘ignore’, suppress error and only existing labels are dropped.

Returns
Series or None

Series with specified index labels removed or None if inplace=True

Raises
KeyError

If any of the labels is not found in the selected axis and error='raise'

See also

Series.reindex

Return only specified index labels of Series

Series.dropna

Return series without null values

Series.drop_duplicates

Return series with duplicate values removed

cudf.core.dataframe.DataFrame.drop

Drop specified labels from rows or columns in dataframe

Examples

>>> s = cudf.Series([1,2,3], index=['x', 'y', 'z'])
>>> s
x    1
y    2
z    3
dtype: int64

Drop labels x and z

>>> s.drop(labels=['x', 'z'])
y    2
dtype: int64

Drop a label from the second level in MultiIndex Series.

>>> midx = cudf.MultiIndex.from_product([[0, 1, 2], ['x', 'y']])
>>> s = cudf.Series(range(6), index=midx)
>>> s
0  x    0
   y    1
1  x    2
   y    3
2  x    4
   y    5
>>> s.drop(labels='y', level=1)
0  x    0
1  x    2
2  x    4
drop_duplicates(keep='first', inplace=False, ignore_index=False)

Return Series with duplicate values removed.

Parameters
keep{‘first’, ‘last’, False}, default ‘first’

Method to handle dropping duplicates:

  • ‘first’ : Drop duplicates except for the first occurrence.

  • ‘last’ : Drop duplicates except for the last occurrence.

  • False : Drop all duplicates.

inplacebool, default False

If True, performs operation inplace and returns None.

Returns
Series or None

Series with duplicates dropped or None if inplace=True.

Examples

>>> s = cudf.Series(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'],
...               name='animal')
>>> s
0      lama
1       cow
2      lama
3    beetle
4      lama
5     hippo
Name: animal, dtype: object

With the keep parameter, the selection behaviour of duplicated values can be changed. The value ‘first’ keeps the first occurrence for each set of duplicated entries. The default value of keep is ‘first’. Note that order of the rows being returned is not guaranteed to be sorted.

>>> s.drop_duplicates()
3    beetle
1       cow
5     hippo
0      lama
Name: animal, dtype: object

The value ‘last’ for parameter keep keeps the last occurrence for each set of duplicated entries.

>>> s.drop_duplicates(keep='last')
3    beetle
1       cow
5     hippo
4      lama
Name: animal, dtype: object

The value False for parameter keep discards all sets of duplicated entries. Setting the value of ‘inplace’ to True performs the operation inplace and returns None.

>>> s.drop_duplicates(keep=False, inplace=True)
>>> s
3    beetle
1       cow
5     hippo
Name: animal, dtype: object
dropna(axis=0, inplace=False, how=None)

Return a Series with null values removed.

Parameters
axis{0 or ‘index’}, default 0

There is only one axis to drop values from.

inplacebool, default False

If True, do operation inplace and return None.

howstr, optional

Not in use. Kept for compatibility.

Returns
Series

Series with null entries dropped from it.

See also

Series.isna

Indicate null values.

Series.notna

Indicate non-null values.

Series.fillna

Replace null values.

cudf.core.dataframe.DataFrame.dropna

Drop rows or columns which contain null values.

cudf.core.index.Index.dropna

Drop null indices.

Examples

>>> import cudf
>>> ser = cudf.Series([1, 2, None])
>>> ser
0       1
1       2
2    null
dtype: int64

Drop null values from a Series.

>>> ser.dropna()
0    1
1    2
dtype: int64

Keep the Series with valid entries in the same variable.

>>> ser.dropna(inplace=True)
>>> ser
0    1
1    2
dtype: int64

Empty strings are not considered null values. None is considered a null value.

>>> ser = cudf.Series(['', None, 'abc'])
>>> ser
0
1    <NA>
2     abc
dtype: object
>>> ser.dropna()
0
2    abc
dtype: object
property dt

Accessor object for datetimelike properties of the Series values.

Returns
A Series indexed like the original Series.
Raises
TypeError if the Series does not contain datetimelike values.

Examples

>>> s.dt.hour
>>> s.dt.second
>>> s.dt.day
property dtype

dtype of the Series

property empty

Indicator whether DataFrame or Series is empty.

True if DataFrame/Series is entirely empty (no items), meaning any of the axes are of length 0.

Returns
outbool

If DataFrame/Series is empty, return True, if not return False.

Notes

If DataFrame/Series contains only null values, it is still not considered empty. See the example below.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'A' : []})
>>> df
Empty DataFrame
Columns: [A]
Index: []
>>> df.empty
True

If we only have null values in our DataFrame, it is not considered empty! We will need to drop the null’s to make the DataFrame empty:

>>> df = cudf.DataFrame({'A' : [None, None]})
>>> df
      A
0  <NA>
1  <NA>
>>> df.empty
False
>>> df.dropna().empty
True

Non-empty and empty Series example:

>>> s = cudf.Series([1, 2, None])
>>> s
0       1
1       2
2    <NA>
dtype: int64
>>> s.empty
False
>>> s = cudf.Series([])
>>> s
Series([], dtype: float64)
>>> s.empty
True
eq(other, fill_value=None, axis=0)

Equal to of series and other, element-wise (binary operator eq).

Parameters
otherSeries or scalar value
fill_valueNone or value

Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null

Returns
Series

The result of the operation.

Examples

>>> import cudf
>>> a = cudf.Series([1, 2, 3, None, 10, 20], index=['a', 'c', 'd', 'e', 'f', 'g'])
>>> a
a       1
c       2
d       3
e    <NA>
f      10
g      20
dtype: int64
>>> b = cudf.Series([-10, 23, -1, None, None], index=['a', 'b', 'c', 'd', 'e'])
>>> b
a     -10
b      23
c      -1
d    <NA>
e    <NA>
dtype: int64
>>> a.eq(b, fill_value=2)
a    False
b    False
c    False
d    False
e     <NA>
f    False
g    False
dtype: bool
equals(other, **kwargs)

Test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal. The column headers do not need to have the same type.

Parameters
otherSeries or DataFrame

The other Series or DataFrame to be compared with the first.

Returns
bool

True if all elements are the same in both objects, False otherwise.

Examples

>>> import cudf

Comparing Series with equals:

>>> s = cudf.Series([1, 2, 3])
>>> other = cudf.Series([1, 2, 3])
>>> s.equals(other)
True
>>> different = cudf.Series([1.5, 2, 3])
>>> s.equals(different)
False

Comparing DataFrames with equals:

>>> df = cudf.DataFrame({1: [10], 2: [20]})
>>> df
    1   2
0  10  20
>>> exactly_equal = cudf.DataFrame({1: [10], 2: [20]})
>>> exactly_equal
    1   2
0  10  20
>>> df.equals(exactly_equal)
True

For two DataFrames to compare equal, the types of column values must be equal, but the types of column labels need not:

>>> different_column_type = cudf.DataFrame({1.0: [10], 2.0: [20]})
>>> different_column_type
   1.0  2.0
0   10   20
>>> df.equals(different_column_type)
True
exp()

Get the exponential of all elements, element-wise.

Exponential is the inverse of the log function, so that x.exp().log() = x

Returns
DataFrame/Series/Index

Result of the element-wise exponential.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.exp()
0    3.678794e-01
1    1.000000e+00
2    2.718282e+00
3    1.383117e+00
4    1.648721e+00
5    4.539993e-05
6    2.688117e+43
dtype: float64

exp operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.exp()
      first        second
0  0.367879      1.263644
1  0.000045      1.349859
2  1.648721  22026.465795

exp operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.exp()
Float64Index([0.36787944117144233,  1.4918246976412703,
              2.718281828459045, 1.0,  1.3498588075760032],
            dtype='float64')
explode(ignore_index=False)

Transform each element of a list-like to a row, replicating index values.

Parameters
ignore_indexbool, default False

If True, the resulting index will be labeled 0, 1, …, n - 1.

Returns
DataFrame

Examples

>>> import cudf
>>> s = cudf.Series([[1, 2, 3], [], None, [4, 5]])
>>> s
0    [1, 2, 3]
1           []
2         None
3       [4, 5]
dtype: list
>>> s.explode()
0       1
0       2
0       3
1    <NA>
2    <NA>
3       4
3       5
dtype: int64
factorize(na_sentinel=- 1)

Encode the input values as integer labels

Parameters
na_sentinelnumber

Value to indicate missing category.

Returns
(labels, cats)(Series, Series)
  • labels contains the encoded values

  • cats contains the categories in order that the N-th item corresponds to the (N-1) code.

Examples

>>> import cudf
>>> s = cudf.Series(['a', 'a', 'c'])
>>> codes, uniques = s.factorize()
>>> codes
0    0
1    0
2    1
dtype: int8
>>> uniques
0    a
1    c
dtype: object
fillna(value=None, method=None, axis=None, inplace=False, limit=None)

Fill null values with value or specified method.

Parameters
valuescalar, Series-like or dict

Value to use to fill nulls. If Series-like, null values are filled with values in corresponding indices. A dict can be used to provide different values to fill nulls in different columns. Cannot be used with method.

method{‘ffill’, ‘bfill’}, default None

Method to use for filling null values in the dataframe or series. ffill propagates the last non-null values forward to the next non-null value. bfill propagates backward with the next non-null value. Cannot be used with value.

Returns
resultDataFrame

Copy with nulls filled.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, None], 'b': [3, None, 5]})
>>> df
      a     b
0     1     3
1     2  <NA>
2  <NA>     5
>>> df.fillna(4)
   a  b
0  1  3
1  2  4
2  4  5
>>> df.fillna({'a': 3, 'b': 4})
   a  b
0  1  3
1  2  4
2  3  5

fillna on a Series object:

>>> ser = cudf.Series(['a', 'b', None, 'c'])
>>> ser
0       a
1       b
2    <NA>
3       c
dtype: object
>>> ser.fillna('z')
0    a
1    b
2    z
3    c
dtype: object

fillna can also supports inplace operation:

>>> ser.fillna('z', inplace=True)
>>> ser
0    a
1    b
2    z
3    c
dtype: object
>>> df.fillna({'a': 3, 'b': 4}, inplace=True)
>>> df
   a  b
0  1  3
1  2  4
2  3  5

fillna specified with fill method

>>> ser = cudf.Series([1, None, None, 2, 3, None, None])
>>> ser.fillna(method='ffill')
0    1
1    1
2    1
3    2
4    3
5    3
6    3
dtype: int64
>>> ser.fillna(method='bfill')
0       1
1       2
2       2
3       2
4       3
5    <NA>
6    <NA>
dtype: int64
floor()

Rounds each value downward to the largest integral value not greater than the original.

Returns
res

Returns a new Series with floor of each element.

Examples

>>> import cudf
>>> series = cudf.Series([-1.9, 2, 0.2, 1.5, 0.0, 3.0])
>>> series
0   -1.9
1    2.0
2    0.2
3    1.5
4    0.0
5    3.0
dtype: float64
>>> series.floor()
0   -2.0
1    2.0
2    0.0
3    1.0
4    0.0
5    3.0
dtype: float64
floordiv(other, fill_value=None, axis=0)

Integer division of series and other, element-wise (binary operator floordiv).

Parameters
otherSeries or scalar value
fill_valueNone or value

Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null

Returns
Series

The result of the operation.

Examples

>>> import cudf
>>> a = cudf.Series([1, 1, 1, None], index=['a', 'b', 'c', 'd'])
>>> a
a       1
b       1
c       1
d    <NA>
dtype: int64
>>> b = cudf.Series([1, None, 1, None], index=['a', 'b', 'd', 'e'])
>>> b
a       1
b    <NA>
d       1
e    <NA>
dtype: int64
>>> a.floordiv(b)
a       1
b    <NA>
c    <NA>
d    <NA>
e    <NA>
dtype: int64
classmethod from_arrow(array)

Convert from PyArrow Array/ChunkedArray to Series.

Parameters
arrayPyArrow Array/ChunkedArray

PyArrow Object which has to be converted to cudf Series.

Returns
cudf Series
Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pyarrow as pa
>>> cudf.Series.from_arrow(pa.array(["a", "b", None]))
0       a
1       b
2    <NA>
dtype: object
classmethod from_categorical(categorical, codes=None)

Creates from a pandas.Categorical

Parameters
categoricalpandas.Categorical

Contains data stored in a pandas Categorical.

codesarray-like, optional.

The category codes of this categorical. If codes are defined, they are used instead of categorical.codes

Returns
Series

A cudf categorical series.

Examples

>>> import cudf
>>> import pandas as pd
>>> pd_categorical = pd.Categorical(pd.Series(['a', 'b', 'c', 'a'], dtype='category'))
>>> pd_categorical
['a', 'b', 'c', 'a']
Categories (3, object): ['a', 'b', 'c']
>>> series = cudf.Series.from_categorical(pd_categorical)
>>> series
0    a
1    b
2    c
3    a
dtype: category
Categories (3, object): ['a', 'b', 'c']
classmethod from_masked_array(data, mask, null_count=None)

Create a Series with null-mask. This is equivalent to:

Series(data).set_mask(mask, null_count=null_count)

Parameters
data1D array-like

The values. Null values must not be skipped. They can appear as garbage values.

mask1D array-like

The null-mask. Valid values are marked as 1; otherwise 0. The mask bit given the data index idx is computed as:

(mask[idx // 8] >> (idx % 8)) & 1
null_countint, optional

The number of null values. If None, it is calculated automatically.

Returns
Series

Examples

>>> import cudf
>>> a = cudf.Series([1, 2, 3, None, 4, None])
>>> a
0       1
1       2
2       3
3    <NA>
4       4
5    <NA>
dtype: int64
>>> b = cudf.Series([10, 11, 12, 13, 14])
>>> cudf.Series.from_masked_array(data=b, mask=a._column.mask)
0      10
1      11
2      12
3    <NA>
4      14
dtype: int64
classmethod from_pandas(s, nan_as_null=None)

Convert from a Pandas Series.

Parameters
sPandas Series object

A Pandas Series object which has to be converted to cuDF Series.

nan_as_nullbool, Default None

If None/True, converts np.nan values to null values. If False, leaves np.nan values as is.

Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pandas as pd
>>> import numpy as np
>>> data = [10, 20, 30, np.nan]
>>> pds = pd.Series(data)
>>> cudf.Series.from_pandas(pds)
0    10.0
1    20.0
2    30.0
3    <NA>
dtype: float64
>>> cudf.Series.from_pandas(pds, nan_as_null=False)
0    10.0
1    20.0
2    30.0
3     NaN
dtype: float64
ge(other, fill_value=None, axis=0)

Greater than or equal to of series and other, element-wise (binary operator ge).

Parameters
otherSeries or scalar value
fill_valueNone or value

Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null

Returns
Series

The result of the operation.

Examples

>>> import cudf
>>> a = cudf.Series([1, 2, 3, None, 10, 20], index=['a', 'c', 'd', 'e', 'f', 'g'])
>>> a
a       1
c       2
d       3
e    <NA>
f      10
g      20
dtype: int64
>>> b = cudf.Series([-10, 23, -1, None, None], index=['a', 'b', 'c', 'd', 'e'])
>>> b
a     -10
b      23
c      -1
d    <NA>
e    <NA>
dtype: int64
>>> a.ge(b)
a     True
b    False
c     True
d    False
e    False
f    False
g    False
dtype: bool
groupby(by=None, axis=0, level=None, as_index=True, sort=False, group_keys=True, squeeze=False, observed=False, dropna=True)

Group Series using a mapper or by a Series of columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

Parameters
bymapping, function, label, or list of labels

Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If an cupy array is passed, the values are used as-is determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted as a (single) key.

levelint, level name, or sequence of such, default None

If the axis is a MultiIndex (hierarchical), group by a particular level or levels.

as_indexbool, default True

For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output.

sortbool, default False

Sort result by group key. Differ from Pandas, cudf defaults to False for better performance. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.

Returns
SeriesGroupBy

Returns a groupby object that contains information about the groups.

Examples

>>> ser = cudf.Series([390., 350., 30., 20.],
...                 index=['Falcon', 'Falcon', 'Parrot', 'Parrot'],
...                 name="Max Speed")
>>> ser
Falcon    390.0
Falcon    350.0
Parrot     30.0
Parrot     20.0
Name: Max Speed, dtype: float64
>>> ser.groupby(level=0).mean()
Falcon    370.0
Parrot     25.0
Name: Max Speed, dtype: float64
>>> ser.groupby(ser > 100).mean()
Max Speed
False     25.0
True     370.0
Name: Max Speed, dtype: float64
gt(other, fill_value=None, axis=0)

Greater than of series and other, element-wise (binary operator gt).

Parameters
otherSeries or scalar value
fill_valueNone or value

Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null

Returns
Series

The result of the operation.

Examples

>>> import cudf
>>> a = cudf.Series([1, 2, 3, None, 10, 20], index=['a', 'c', 'd', 'e', 'f', 'g'])
>>> a
a       1
c       2
d       3
e    <NA>
f      10
g      20
dtype: int64
>>> b = cudf.Series([-10, 23, -1, None, None], index=['a', 'b', 'c', 'd', 'e'])
>>> b
a     -10
b      23
c      -1
d    <NA>
e    <NA>
dtype: int64
>>> a.gt(b)
a     True
b    False
c     True
d    False
e    False
f    False
g    False
dtype: bool
property has_nulls

Indicator whether Series contains null values.

Returns
outbool

If Series has atleast one null value, return True, if not return False.

Examples

>>> import cudf
>>> series = cudf.Series([1, 2, None, 3, 4])
>>> series
0       1
1       2
2    <NA>
3       3
4       4
dtype: int64
>>> series.has_nulls
True
>>> series.dropna().has_nulls
False
hash_encode(stop, use_name=False)

Encode column values as ints in [0, stop) using hash function.

Parameters
stopint

The upper bound on the encoding range.

use_namebool

If True then combine hashed column values with hashed column name. This is useful for when the same values in different columns should be encoded with different hashed values.

Returns
resultSeries

The encoded Series.

Examples

>>> import cudf
>>> series = cudf.Series([10, 120, 30])
>>> series.hash_encode(stop=200)
0     53
1     51
2    124
dtype: int32

You can choose to include name while hash encoding by specifying use_name=True

>>> series.hash_encode(stop=200, use_name=True)
0    131
1     29
2     76
dtype: int32
hash_values()

Compute the hash of values in this column.

Returns
cupy array

A cupy array with hash values.

Examples

>>> import cudf
>>> series = cudf.Series([10, 120, 30])
>>> series
0     10
1    120
2     30
dtype: int64
>>> series.hash_values()
array([-1930516747,   422619251,  -941520876], dtype=int32)
head(n=5)

Return the first n rows. This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it. For negative values of n, this function returns all rows except the last n rows, equivalent to df[:-n].

Parameters
nint, default 5

Number of rows to select.

Returns
same type as caller

The first n rows of the caller object.

See also

Series.tail

Returns the last n rows.

Examples

>>> ser = cudf.Series(['alligator', 'bee', 'falcon',
... 'lion', 'monkey', 'parrot', 'shark', 'whale', 'zebra'])
>>> ser
0    alligator
1          bee
2       falcon
3         lion
4       monkey
5       parrot
6        shark
7        whale
8        zebra
dtype: object

Viewing the first 5 lines

>>> ser.head()
0    alligator
1          bee
2       falcon
3         lion
4       monkey
dtype: object

Viewing the first n lines (three in this case)

>>> ser.head(3)
0    alligator
1          bee
2       falcon
dtype: object

For negative values of n

>>> ser.head(-3)
0    alligator
1          bee
2       falcon
3         lion
4       monkey
5       parrot
dtype: object
property iloc

Select values by position.

Examples

>>> import cudf
>>> s = cudf.Series([10, 20, 30])
>>> s
0    10
1    20
2    30
dtype: int64
>>> s.iloc[2]
30
property index

The index object

interleave_columns()

Interleave Series columns of a table into a single column.

Converts the column major table cols into a row major column.

Parameters
colsinput Table containing columns to interleave.
Returns
The interleaved columns as a single column

Examples

>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']])
>>> df
0    [A1, A2, A3]
1    [B1, B2, B3]
>>> df.interleave_columns()
0    A1
1    B1
2    A2
3    B2
4    A3
5    B3
property is_monotonic

Return boolean if values in the object are monotonic_increasing.

Returns
outbool
property is_monotonic_decreasing

Return boolean if values in the object are monotonic_decreasing.

Returns
outbool
property is_monotonic_increasing

Return boolean if values in the object are monotonic_increasing.

Returns
outbool
property is_unique

Return boolean if values in the object are unique.

Returns
outbool
isin(values)

Check whether values are contained in Series.

Parameters
valuesset or list-like

The sequence of values to test. Passing in a single string will raise a TypeError. Instead, turn a single string into a list of one element.

Returns
resultSeries

Series of booleans indicating if each element is in values.

Raises
TypeError

If values is a string

Examples

>>> import cudf
>>> s = cudf.Series(['lama', 'cow', 'lama', 'beetle', 'lama',
...                'hippo'], name='animal')
>>> s.isin(['cow', 'lama'])
0     True
1     True
2     True
3    False
4     True
5    False
Name: animal, dtype: bool

Passing a single string as s.isin('lama') will raise an error. Use a list of one element instead:

>>> s.isin(['lama'])
0     True
1    False
2     True
3    False
4     True
5    False
Name: animal, dtype: bool

Strings and integers are distinct and are therefore not comparable:

>>> cudf.Series([1]).isin(['1'])
0    False
dtype: bool
>>> cudf.Series([1.1]).isin(['1.1'])
0    False
dtype: bool
isna()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
isnull()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
keys()

Return alias for index.

Returns
Index

Index of the Series.

Examples

>>> import cudf
>>> sr = cudf.Series([10, 11, 12, 13, 14, 15])
>>> sr
0    10
1    11
2    12
3    13
4    14
5    15
dtype: int64
>>> sr.keys()
RangeIndex(start=0, stop=6)
>>> sr = cudf.Series(['a', 'b', 'c'])
>>> sr
0    a
1    b
2    c
dtype: object
>>> sr.keys()
RangeIndex(start=0, stop=3)
>>> sr = cudf.Series([1, 2, 3], index=['a', 'b', 'c'])
>>> sr
a    1
b    2
c    3
dtype: int64
>>> sr.keys()
StringIndex(['a' 'b' 'c'], dtype='object')
kurt(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return Fisher’s unbiased kurtosis of a sample.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters
skipnabool, default True

Exclude NA/null values when computing the result.

Returns
scalar

Notes

Parameters currently not supported are axis, level and numeric_only

Examples

>>> import cudf
>>> series = cudf.Series([1, 2, 3, 4])
>>> series.kurtosis()
-1.1999999999999904
kurtosis(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return Fisher’s unbiased kurtosis of a sample.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters
skipnabool, default True

Exclude NA/null values when computing the result.

Returns
scalar

Notes

Parameters currently not supported are axis, level and numeric_only

Examples

>>> import cudf
>>> series = cudf.Series([1, 2, 3, 4])
>>> series.kurtosis()
-1.1999999999999904
label_encoding(cats, dtype=None, na_sentinel=- 1)

Perform label encoding

Parameters
valuessequence of input values
dtypenumpy.dtype; optional

Specifies the output dtype. If None is given, the smallest possible integer dtype (starting with np.int8) is used.

na_sentinelnumber, default -1

Value to indicate missing category.

Returns
A sequence of encoded labels with value between 0 and n-1 classes(cats)

Examples

>>> import cudf
>>> s = cudf.Series([1, 2, 3, 4, 10])
>>> s.label_encoding([2, 3])
0   -1
1    0
2    1
3   -1
4   -1
dtype: int8

na_sentinel parameter can be used to control the value when there is no encoding.

>>> s.label_encoding([2, 3], na_sentinel=10)
0    10
1     0
2     1
3    10
4    10
dtype: int8

When none of cats values exist in s, entire Series will be na_sentinel.

>>> s.label_encoding(['a', 'b', 'c'])
0   -1
1   -1
2   -1
3   -1
4   -1
dtype: int8
le(other, fill_value=None, axis=0)

Less than or equal to of series and other, element-wise (binary operator le).

Parameters
otherSeries or scalar value
fill_valueNone or value

Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null

Returns
Series

The result of the operation.

Examples

>>> import cudf
>>> a = cudf.Series([1, 2, 3, None, 10, 20], index=['a', 'c', 'd', 'e', 'f', 'g'])
>>> a
a       1
c       2
d       3
e    <NA>
f      10
g      20
dtype: int64
>>> b = cudf.Series([-10, 23, -1, None, None], index=['a', 'b', 'c', 'd', 'e'])
>>> b
a     -10
b      23
c      -1
d    <NA>
e    <NA>
dtype: int64
>>> a.le(b, fill_value=-10)
a    False
b     True
c    False
d    False
e     <NA>
f    False
g    False
dtype: bool
property loc

Select values by label.

Examples

>>> import cudf
>>> series = cudf.Series([10, 11, 12], index=['a', 'b', 'c'])
>>> series
a    10
b    11
c    12
dtype: int64
>>> series.loc['b']
11
log()

Get the natural logarithm of all elements, element-wise.

Natural logarithm is the inverse of the exp function, so that x.log().exp() = x

Returns
DataFrame/Series/Index

Result of the element-wise natural logarithm.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.log()
0         NaN
1        -inf
2    0.000000
3   -1.125963
4   -0.693147
5         NaN
6    4.605170
dtype: float64

log operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.log()
      first    second
0       NaN -1.452434
1       NaN -1.203973
2 -0.693147  2.302585

log operation on Index:

>>> index = cudf.Index([10, 11, 500.0])
>>> index
Float64Index([10.0, 11.0, 500.0], dtype='float64')
>>> index.log()
Float64Index([2.302585092994046, 2.3978952727983707,
            6.214608098422191], dtype='float64')
lt(other, fill_value=None, axis=0)

Less than of series and other, element-wise (binary operator lt).

Parameters
otherSeries or scalar value
fill_valueNone or value

Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null

Returns
Series

The result of the operation.

Examples

>>> import cudf
>>> a = cudf.Series([1, 2, 3, None, 10, 20], index=['a', 'c', 'd', 'e', 'f', 'g'])
>>> a
a       1
c       2
d       3
e    <NA>
f      10
g      20
dtype: int64
>>> b = cudf.Series([-10, 23, -1, None, None], index=['a', 'b', 'c', 'd', 'e'])
>>> b
a     -10
b      23
c      -1
d    <NA>
e    <NA>
dtype: int64
>>> a.lt(b, fill_value=-10)
a    False
b     True
c    False
d    False
e     <NA>
f    False
g    False
dtype: bool
map(arg, na_action=None)Series

Map values of Series according to input correspondence.

Used for substituting each value in a Series with another value, that may be derived from a function, a dict or a Series.

Parameters
argfunction, collections.abc.Mapping subclass or Series

Mapping correspondence.

na_action{None, ‘ignore’}, default None

If ‘ignore’, propagate NaN values, without passing them to the mapping correspondence.

Returns
Series

Same index as caller.

Notes

Please note map currently only supports fixed-width numeric type functions.

Examples

>>> s = cudf.Series(['cat', 'dog', np.nan, 'rabbit'])
>>> s
0      cat
1      dog
2     <NA>
3   rabbit
dtype: object

map accepts a dict or a Series. Values that are not found in the dict are converted to NaN, default values in dicts are currently not supported.:

>>> s.map({'cat': 'kitten', 'dog': 'puppy'})
0   kitten
1    puppy
2     <NA>
3     <NA>
dtype: object

It also accepts numeric functions:

>>> s = cudf.Series([1, 2, 3, 4, np.nan])
>>> s.map(lambda x: x ** 2)
0       1
1       4
2       9
3       16
4     <NA>
dtype: int64
mask(cond, other=None, inplace=False)

Replace values where the condition is True.

Parameters
condbool Series/DataFrame, array-like

Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.

other: scalar, list of scalars, Series/DataFrame

Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.

DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.

Series expects only scalar or series like with same length

inplacebool, default False

Whether to perform the operation in place on the data.

Returns
Same type as caller

Examples

>>> import cudf
>>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]})
>>> df.mask(df % 2 == 0, [-1, -1])
   A  B
0  1  3
1 -1  5
2  5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.mask(ser > 2, 10)
0    10
1    10
2     2
3     1
4     0
dtype: int64
>>> ser.mask(ser > 2)
0    <NA>
1    <NA>
2       2
3       1
4       0
dtype: int64
max(axis=None, skipna=None, dtype=None, level=None, numeric_only=None, **kwargs)

Return the maximum of the values in the Series.

Parameters
skipnabool, default True

Exclude NA/null values when computing the result.

dtypedata type

Data type to cast the result to.

Returns
scalar

Notes

Parameters currently not supported are axis, level, numeric_only.

Examples

>>> import cudf
>>> ser = cudf.Series([1, 5, 2, 4, 3])
>>> ser.max()
5
mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the mean of the values in the series.

Parameters
skipnabool, default True

Exclude NA/null values when computing the result.

Returns
scalar

Notes

Parameters currently not supported are axis, level and numeric_only

Examples

>>> import cudf
>>> ser = cudf.Series([10, 25, 3, 25, 24, 6])
>>> ser.mean()
15.5
median(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the median of the values for the requested axis.

Parameters
skipnabool, default True

Exclude NA/null values when computing the result.

Returns
scalar

Notes

Parameters currently not supported are axis, level and numeric_only

Examples

>>> import cudf
>>> ser = cudf.Series([10, 25, 3, 25, 24, 6])
>>> ser
0    10
1    25
2     3
3    25
4    24
5     6
dtype: int64
>>> ser.median()
17.0
memory_usage(index=True, deep=False)

Return the memory usage of the Series.

The memory usage can optionally include the contribution of the index and of elements of object dtype.

Parameters
indexbool, default True

Specifies whether to include the memory usage of the Series index.

deepbool, default False

If True, introspect the data deeply by interrogating object dtypes for system-level memory consumption, and include it in the returned value.

Returns
int

Bytes of memory consumed.

See also

cudf.core.dataframe.DataFrame.memory_usage

Bytes consumed by a DataFrame.

Examples

>>> s = cudf.Series(range(3), index=['a','b','c'])
>>> s.memory_usage()
48

Not including the index gives the size of the rest of the data, which is necessarily smaller:

>>> s.memory_usage(index=False)
24
min(axis=None, skipna=None, dtype=None, level=None, numeric_only=None, **kwargs)

Return the minimum of the values in the Series.

Parameters
skipnabool, default True

Exclude NA/null values when computing the result.

dtypedata type

Data type to cast the result to.

Returns
scalar

Notes

Parameters currently not supported are axis, level, numeric_only.

Examples

>>> import cudf
>>> ser = cudf.Series([1, 5, 2, 4, 3])
>>> ser.min()
1
mod(other, fill_value=None, axis=0)

Modulo of series and other, element-wise (binary operator mod).

Parameters
otherSeries or scalar value
fill_valueNone or value

Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null

Returns
Series

The result of the operation.

Examples

>>> import cudf
>>> series = cudf.Series([10, 20, 30])
>>> series
0    10
1    20
2    30
dtype: int64
>>> series.mod(4)
0    2
1    0
2    2
dtype: int64
mode(dropna=True)

Return the mode(s) of the dataset.

Always returns Series even if only one value is returned.

Parameters
dropnabool, default True

Don’t consider counts of NA/NaN/NaT.

Returns
Series

Modes of the Series in sorted order.

Examples

>>> import cudf
>>> series = cudf.Series([7, 6, 5, 4, 3, 2, 1])
>>> series
0    7
1    6
2    5
3    4
4    3
5    2
6    1
dtype: int64
>>> series.mode()
0    1
1    2
2    3
3    4
4    5
5    6
6    7
dtype: int64

We can include <NA> values in mode by passing dropna=False.

>>> series = cudf.Series([7, 4, 3, 3, 7, None, None])
>>> series
0       7
1       4
2       3
3       3
4       7
5    <NA>
6    <NA>
dtype: int64
>>> series.mode()
0    3
1    7
dtype: int64
>>> series.mode(dropna=False)
0       3
1       7
2    <NA>
dtype: int64
mul(other, fill_value=None, axis=0)

Multiplication of series and other, element-wise (binary operator mul).

Parameters
otherSeries or scalar value
fill_valueNone or value

Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null

Returns
Series

The result of the operation.

Examples

>>> import cudf
>>> a = cudf.Series([1, 2, 3, None], index=['a', 'b', 'c', 'd'])
>>> a
a       1
b       2
c       3
d    <NA>
dtype: int64
>>> b = cudf.Series([1, None, 2, None], index=['a', 'b', 'd', 'e'])
>>> b
a       1
b    <NA>
d       2
e    <NA>
dtype: int64
>>> a.multiply(b, fill_value=0)
a       1
b       0
c       0
d       0
e    <NA>
dtype: int64
multiply(other, fill_value=None, axis=0)

Multiplication of series and other, element-wise (binary operator mul).

Parameters
otherSeries or scalar value
fill_valueNone or value

Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null

Returns
Series

The result of the operation.

Examples

>>> import cudf
>>> a = cudf.Series([1, 2, 3, None], index=['a', 'b', 'c', 'd'])
>>> a
a       1
b       2
c       3
d    <NA>
dtype: int64
>>> b = cudf.Series([1, None, 2, None], index=['a', 'b', 'd', 'e'])
>>> b
a       1
b    <NA>
d       2
e    <NA>
dtype: int64
>>> a.multiply(b, fill_value=0)
a       1
b       0
c       0
d       0
e    <NA>
dtype: int64
property name

Returns name of the Series.

nans_to_nulls()

Convert nans (if any) to nulls

Returns
Series

Examples

>>> import cudf
>>> import numpy as np
>>> series = cudf.Series([1, 2, np.nan, None, 10], nan_as_null=False)
>>> series
0     1.0
1     2.0
2     NaN
3    <NA>
4    10.0
dtype: float64
>>> series.nans_to_nulls()
0     1.0
1     2.0
2    <NA>
3    <NA>
4    10.0
dtype: float64
property ndim

Dimension of the data. Series ndim is always 1.

ne(other, fill_value=None, axis=0)

Not equal to of series and other, element-wise (binary operator ne).

Parameters
otherSeries or scalar value
fill_valueNone or value

Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null

Returns
Series

The result of the operation.

Examples

>>> import cudf
>>> a = cudf.Series([1, 2, 3, None, 10, 20], index=['a', 'c', 'd', 'e', 'f', 'g'])
>>> a
a       1
c       2
d       3
e    <NA>
f      10
g      20
dtype: int64
>>> b = cudf.Series([-10, 23, -1, None, None], index=['a', 'b', 'c', 'd', 'e'])
>>> b
a     -10
b      23
c      -1
d    <NA>
e    <NA>
dtype: int64
>>> a.ne(b, fill_value=2)
a    True
b    True
c    True
d    True
e    <NA>
f    True
g    True
dtype: bool
nlargest(n=5, keep='first')

Returns a new Series of the n largest element.

Parameters
nint, default 5

Return this many descending sorted values.

keep{‘first’, ‘last’}, default ‘first’

When there are duplicate values that cannot all fit in a Series of n elements:

  • first : return the first n occurrences in order of appearance.

  • last : return the last n occurrences in reverse order of appearance.

Returns
Series

The n largest values in the Series, sorted in decreasing order.

Examples

>>> import cudf
>>> countries_population = {"Italy": 59000000, "France": 65000000,
...                         "Malta": 434000, "Maldives": 434000,
...                         "Brunei": 434000, "Iceland": 337000,
...                         "Nauru": 11300, "Tuvalu": 11300,
...                         "Anguilla": 11300, "Montserrat": 5200}
>>> series = cudf.Series(countries_population)
>>> series
Italy         59000000
France        65000000
Malta           434000
Maldives        434000
Brunei          434000
Iceland         337000
Nauru            11300
Tuvalu           11300
Anguilla         11300
Montserrat        5200
dtype: int64
>>> series.nlargest()
France      65000000
Italy       59000000
Malta         434000
Maldives      434000
Brunei        434000
dtype: int64
>>> series.nlargest(3)
France    65000000
Italy     59000000
Malta       434000
dtype: int64
>>> series.nlargest(3, keep='last')
France    65000000
Italy     59000000
Brunei      434000
dtype: int64
notna()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
notnull()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
nsmallest(n=5, keep='first')

Returns a new Series of the n smallest element.

Parameters
nint, default 5

Return this many ascending sorted values.

keep{‘first’, ‘last’}, default ‘first’

When there are duplicate values that cannot all fit in a Series of n elements:

  • first : return the first n occurrences in order of appearance.

  • last : return the last n occurrences in reverse order of appearance.

Returns
Series

The n smallest values in the Series, sorted in increasing order.

Examples

>>> import cudf
>>> countries_population = {"Italy": 59000000, "France": 65000000,
...                         "Brunei": 434000, "Malta": 434000,
...                         "Maldives": 434000, "Iceland": 337000,
...                         "Nauru": 11300, "Tuvalu": 11300,
...                         "Anguilla": 11300, "Montserrat": 5200}
>>> s = cudf.Series(countries_population)
>>> s
Italy       59000000
France      65000000
Brunei        434000
Malta         434000
Maldives      434000
Iceland       337000
Nauru          11300
Tuvalu         11300
Anguilla       11300
Montserrat      5200
dtype: int64

The n smallest elements where n=5 by default.

>>> s.nsmallest()
Montserrat    5200
Nauru        11300
Tuvalu       11300
Anguilla     11300
Iceland     337000
dtype: int64

The n smallest elements where n=3. Default keep value is ‘first’ so Nauru and Tuvalu will be kept.

>>> s.nsmallest(3)
Montserrat   5200
Nauru       11300
Tuvalu      11300
dtype: int64

The n smallest elements where n=3 and keeping the last duplicates. Anguilla and Tuvalu will be kept since they are the last with value 11300 based on the index order.

>>> s.nsmallest(3, keep='last')
Montserrat   5200
Anguilla    11300
Tuvalu      11300
dtype: int64
property null_count

Number of null values

property nullable

A boolean indicating whether a null-mask is needed

property nullmask

The gpu buffer for the null-mask

nunique(method='sort', dropna=True)

Returns the number of unique values of the Series: approximate version, and exact version to be moved to libgdf

Excludes NA values by default.

Parameters
dropnabool, default True

Don’t include NA values in the count.

Returns
int

Examples

>>> import cudf
>>> s = cudf.Series([1, 3, 5, 7, 7])
>>> s
0    1
1    3
2    5
3    7
4    7
dtype: int64
>>> s.nunique()
4
one_hot_encoding(cats, dtype='float64')

Perform one-hot-encoding

Parameters
catssequence of values

values representing each category.

dtypenumpy.dtype

specifies the output dtype.

Returns
Sequence

A sequence of new series for each category. Its length is determined by the length of cats.

Examples

>>> import cudf
>>> s = cudf.Series(['a', 'b', 'c', 'a'])
>>> s
0    a
1    b
2    c
3    a
dtype: object
>>> s.one_hot_encoding(['a', 'c', 'b'])
[0    1.0
1    0.0
2    0.0
3    1.0
dtype: float64, 0    0.0
1    0.0
2    1.0
3    0.0
dtype: float64, 0    0.0
1    1.0
2    0.0
3    0.0
dtype: float64]
pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

Parameters
funcfunction

Function to apply to the Series/DataFrame/Index. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame/Index.

argsiterable, optional

Positional arguments passed into func.

kwargsmapping, optional

A dictionary of keyword arguments passed into func.

Returns
objectthe return type of func.

Examples

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. Instead of writing

>>> func(g(h(df), arg1=a), arg2=b, arg3=c)

You can write

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe(func, arg2=b, arg3=c)
... )

If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose f takes its data as arg2:

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe((func, 'arg2'), arg1=a, arg3=c)
...  )
pow(other, fill_value=None, axis=0)

Exponential power of series and other, element-wise (binary operator pow).

Parameters
otherSeries or scalar value
fill_valueNone or value

Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null

Returns
Series

The result of the operation.

Examples

>>> import cudf
>>> a = cudf.Series([1, 2, 3, None], index=['a', 'b', 'c', 'd'])
>>> a
a       1
b       2
c       3
d    <NA>
dtype: int64
>>> b = cudf.Series([10, None, 12, None], index=['a', 'b', 'd', 'e'])
>>> b
a      10
b    <NA>
d      12
e    <NA>
dtype: int64
>>> a.pow(b, fill_value=0)
a       1
b       1
c       1
d       0
e    <NA>
dtype: int64
prod(axis=None, skipna=None, dtype=None, level=None, numeric_only=None, min_count=0, **kwargs)

Return product of the values in the series

Parameters
skipnabool, default True

Exclude NA/null values when computing the result.

dtypedata type

Data type to cast the result to.

min_countint, default 0

The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

The default being 0. This means the sum of an all-NA or empty Series is 0, and the product of an all-NA or empty Series is 1.

Returns
scalar

Notes

Parameters currently not supported are axis, level, numeric_only.

Examples

>>> import cudf
>>> ser = cudf.Series([1, 5, 2, 4, 3])
>>> ser.prod()
120
product(axis=None, skipna=None, dtype=None, level=None, numeric_only=None, min_count=0, **kwargs)

Return product of the values in the Series.

Parameters
skipnabool, default True

Exclude NA/null values when computing the result.

dtypedata type

Data type to cast the result to.

min_countint, default 0

The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

The default being 0. This means the sum of an all-NA or empty Series is 0, and the product of an all-NA or empty Series is 1.

Returns
scalar

Notes

Parameters currently not supported are axis, level, numeric_only.

Examples

>>> import cudf
>>> ser = cudf.Series([1, 5, 2, 4, 3])
>>> ser.product()
120
quantile(q=0.5, interpolation='linear', exact=True, quant_index=True)

Return values at the given quantile.

Parameters
qfloat or array-like, default 0.5 (50% quantile)

0 <= q <= 1, the quantile(s) to compute

interpolation{’linear’, ‘lower’, ‘higher’, ‘midpoint’, ‘nearest’}

This optional parameter specifies the interpolation method to use, when the desired quantile lies between two data points i and j:

columnslist of str

List of column names to include.

exactboolean

Whether to use approximate or exact quantile algorithm.

quant_indexboolean

Whether to use the list of quantiles as index.

Returns
float or Series

If q is an array, a Series will be returned where the index is q and the values are the quantiles, otherwise a float will be returned.

Examples

>>> import cudf
>>> series = cudf.Series([1, 2, 3, 4])
>>> series
0    1
1    2
2    3
3    4
dtype: int64
>>> series.quantile(0.5)
2.5
>>> series.quantile([0.25, 0.5, 0.75])
0.25    1.75
0.50    2.50
0.75    3.25
dtype: float64
radd(other, fill_value=None, axis=0)

Addition of series and other, element-wise (binary operator radd).

Parameters
otherSeries or scalar value
fill_valueNone or value

Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null

Returns
Series

The result of the operation.

Examples

>>> import cudf
>>> a = cudf.Series([1, 2, 3, None], index=['a', 'b', 'c', 'd'])
>>> a
a       1
b       2
c       3
d    <NA>
dtype: int64
>>> b = cudf.Series([1, None, 1, None], index=['a', 'b', 'd', 'e'])
>>> b
a       1
b    <NA>
d       1
e    <NA>
dtype: int64
>>> a.add(b, fill_value=0)
a       2
b       2
c       3
d       1
e    <NA>
dtype: int64
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)

Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.

Parameters
axis{0 or ‘index’, 1 or ‘columns’}, default 0

Index to direct ranking.

method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’

How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.

numeric_onlybool, optional

For DataFrame objects, rank only numeric columns if set to True.

na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’

How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.

ascendingbool, default True

Whether or not the elements should be ranked in ascending order.

pctbool, default False

Whether or not to display the returned rankings in percentile form.

Returns
same type as caller

Return a Series or DataFrame with data ranks as values.

reindex(index=None, copy=True)

Return a Series that conforms to a new index

Parameters
indexIndex, Series-convertible, default None
copyboolean, default True
Returns
A new Series that conforms to the supplied index

Examples

>>> import cudf
>>> series = cudf.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])
>>> series
a    10
b    20
c    30
d    40
dtype: int64
>>> series.reindex(['a', 'b', 'y', 'z'])
a      10
b      20
y    <NA>
z    <NA>
dtype: int64
rename(index=None, copy=True)

Alter Series name

Change Series.name with a scalar value

Parameters
indexScalar, optional

Scalar to alter the Series.name attribute

copyboolean, default True

Also copy underlying data

Returns
Series

Notes

Difference from pandas:
  • Supports scalar values only for changing name attribute

  • Not supporting : inplace, level

Examples

>>> import cudf
>>> series = cudf.Series([10, 20, 30])
>>> series
0    10
1    20
2    30
dtype: int64
>>> series.name
>>> renamed_series = series.rename('numeric_series')
>>> renamed_series
0    10
1    20
2    30
Name: numeric_series, dtype: int64
>>> renamed_series.name
'numeric_series'
repeat(repeats, axis=None)

Repeats elements consecutively.

Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.

Parameters
repeatsint, or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.

Returns
Series/DataFrame/Index

A newly created object of same type as caller with repeated elements.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]})
>>> df
   a   b
0  1  10
1  2  20
2  3  30
>>> df.repeat(3)
   a   b
0  1  10
0  1  10
0  1  10
1  2  20
1  2  20
1  2  20
2  3  30
2  3  30
2  3  30

Repeat on Series

>>> s = cudf.Series([0, 2])
>>> s
0    0
1    2
dtype: int64
>>> s.repeat([3, 4])
0    0
0    0
0    0
1    2
1    2
1    2
1    2
dtype: int64
>>> s.repeat(2)
0    0
0    0
1    2
1    2
dtype: int64

Repeat on Index

>>> index = cudf.Index([10, 22, 33, 55])
>>> index
Int64Index([10, 22, 33, 55], dtype='int64')
>>> index.repeat(5)
Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33,
            33, 33, 33, 33, 55, 55, 55, 55, 55],
        dtype='int64')
replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method=None)

Replace values given in to_replace with value.

Parameters
to_replacenumeric, str or list-like

Value(s) to replace.

  • numeric or str:
    • values equal to to_replace will be replaced with value

  • list of numeric or str:
    • If value is also list-like, to_replace and value must be of same length.

  • dict:
    • Dicts can be used to specify different replacement values for different existing values. For example, {‘a’: ‘b’, ‘y’: ‘z’} replaces the value ‘a’ with ‘b’ and ‘y’ with ‘z’. To use a dict in this way the value parameter should be None.

valuescalar, dict, list-like, str, default None

Value to replace any values matching to_replace with.

inplacebool, default False

If True, in place.

Returns
resultSeries

Series after replacement. The mask and index are preserved.

Raises
TypeError
  • If to_replace is not a scalar, array-like, dict, or None

  • If to_replace is a dict and value is not a list, dict, or Series

ValueError
  • If a list is passed to to_replace and value but they are not the same length.

See also

Series.fillna

Notes

Parameters that are currently not supported are: limit, regex, method

Examples

Scalar to_replace and value

>>> import cudf
>>> s = cudf.Series([0, 1, 2, 3, 4])
>>> s
0    0
1    1
2    2
3    3
4    4
dtype: int64
>>> s.replace(0, 5)
0    5
1    1
2    2
3    3
4    4
dtype: int64

List-like to_replace

>>> s.replace([1, 2], 10)
0     0
1    10
2    10
3     3
4     4
dtype: int64

dict-like to_replace

>>> s.replace({1:5, 3:50})
0     0
1     5
2     2
3    50
4     4
dtype: int64
>>> s = cudf.Series(['b', 'a', 'a', 'b', 'a'])
>>> s
0     b
1     a
2     a
3     b
4     a
dtype: object
>>> s.replace({'a': None})
0       b
1    <NA>
2    <NA>
3       b
4    <NA>
dtype: object

If there is a mimatch in types of the values in to_replace & value with the actual series, then cudf exhibits different behaviour with respect to pandas and the pairs are ignored silently:

>>> s = cudf.Series(['b', 'a', 'a', 'b', 'a'])
>>> s
0    b
1    a
2    a
3    b
4    a
dtype: object
>>> s.replace('a', 1)
0    b
1    a
2    a
3    b
4    a
dtype: object
>>> s.replace(['a', 'c'], [1, 2])
0    b
1    a
2    a
3    b
4    a
dtype: object
reset_index(drop=False, inplace=False)

Reset index to RangeIndex

Parameters
dropbool, default False

Just reset the index, without inserting it as a column in the new DataFrame.

inplacebool, default False

Modify the Series in place (do not create a new object).

Returns
Series or DataFrame or None

When drop is False (the default), a DataFrame is returned. The newly created columns will come first in the DataFrame, followed by the original Series values. When drop is True, a Series is returned. In either case, if inplace=True, no value is returned.

Examples

>>> import cudf
>>> series = cudf.Series(['a', 'b', 'c', 'd'], index=[10, 11, 12, 13])
>>> series
10    a
11    b
12    c
13    d
dtype: object
>>> series.reset_index()
   index  0
0     10  a
1     11  b
2     12  c
3     13  d
>>> series.reset_index(drop=True)
0    a
1    b
2    c
3    d
dtype: object
reverse()

Reverse the Series

Returns
Series

A reversed Series.

Examples

>>> import cudf
>>> series = cudf.Series([1, 2, 3, 4, 5, 6])
>>> series
0    1
1    2
2    3
3    4
4    5
5    6
dtype: int64
>>> series.reverse()
5    6
4    5
3    4
2    3
1    2
0    1
dtype: int64
rfloordiv(other, fill_value=None, axis=0)

Integer division of series and other, element-wise (binary operator rfloordiv).

Parameters
otherSeries or scalar value
fill_valueNone or value

Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null

Returns
Series

Result of the arithmetic operation.

Examples

>>> import cudf
>>> s = cudf.Series([1, 2, 10, 17])
>>> s
0     1
1     2
2    10
3    17
dtype: int64
>>> s.rfloordiv(100)
0    100
1     50
2     10
3      5
dtype: int64
>>> s = cudf.Series([10, 20, None])
>>> s
0      10
1      20
2    <NA>
dtype: int64
>>> s.rfloordiv(200)
0      20
1      10
2    <NA>
dtype: int64
>>> s.rfloordiv(200, fill_value=2)
0     20
1     10
2    100
dtype: int64
rmod(other, fill_value=None, axis=0)

Modulo of series and other, element-wise (binary operator rmod).

Parameters
otherSeries or scalar value
fill_valueNone or value

Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null

Returns
Series

The result of the operation.

Examples

>>> import cudf
>>> a = cudf.Series([10, 20, None, 30, 40], index=['a', 'b', 'c', 'd', 'e'])
>>> a
a      10
b      20
c    <NA>
d      30
e      40
dtype: int64
>>> b = cudf.Series([None, 1, 20, 5, 4], index=['a', 'b', 'd', 'e', 'f'])
>>> b
a    <NA>
b       1
d      20
e       5
f       4
dtype: int64
>>> a.rmod(b, fill_value=10)
a       0
b       1
c    <NA>
d      20
e       5
f       4
dtype: int64
rmul(other, fill_value=None, axis=0)

Multiplication of series and other, element-wise (binary operator rmul).

Parameters
otherSeries or scalar value
fill_valueNone or value

Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null

Returns
Series

The result of the operation.

Examples

>>> import cudf
>>> a = cudf.Series([10, 20, None, 30, 40], index=['a', 'b', 'c', 'd', 'e'])
>>> a
a      10
b      20
c    <NA>
d      30
e      40
dtype: int64
>>> b = cudf.Series([None, 1, 20, 5, 4], index=['a', 'b', 'd', 'e', 'f'])
>>> b
a    <NA>
b       1
d      20
e       5
f       4
dtype: int64
>>> a.rmul(b, fill_value=2)
a      20
b      20
c    <NA>
d     600
e     200
f       8
dtype: int64
rolling(window, min_periods=None, center=False, axis=0, win_type=None)

Rolling window calculations.

Parameters
windowint or offset

Size of the window, i.e., the number of observations used to calculate the statistic. For datetime indexes, an offset can be provided instead of an int. The offset must be convertible to a timedelta. As opposed to a fixed window size, each window will be sized to accommodate observations within the time period specified by the offset.

min_periodsint, optional

The minimum number of observations in the window that are required to be non-null, so that the result is non-null. If not provided or None, min_periods is equal to the window size.

centerbool, optional

If True, the result is set at the center of the window. If False (default), the result is set at the right edge of the window.

Returns
Rolling object.

Examples

>>> import cudf
>>> a = cudf.Series([1, 2, 3, None, 4])

Rolling sum with window size 2.

>>> print(a.rolling(2).sum())
0
1    3
2    5
3
4
dtype: int64

Rolling sum with window size 2 and min_periods 1.

>>> print(a.rolling(2, min_periods=1).sum())
0    1
1    3
2    5
3    3
4    4
dtype: int64

Rolling count with window size 3.

>>> print(a.rolling(3).count())
0    1
1    2
2    3
3    2
4    2
dtype: int64

Rolling count with window size 3, but with the result set at the center of the window.

>>> print(a.rolling(3, center=True).count())
0    2
1    3
2    2
3    2
4    1 dtype: int64

Rolling max with variable window size specified by an offset; only valid for datetime index.

>>> a = cudf.Series(
...     [1, 9, 5, 4, np.nan, 1],
...     index=[
...         pd.Timestamp('20190101 09:00:00'),
...         pd.Timestamp('20190101 09:00:01'),
...         pd.Timestamp('20190101 09:00:02'),
...         pd.Timestamp('20190101 09:00:04'),
...         pd.Timestamp('20190101 09:00:07'),
...         pd.Timestamp('20190101 09:00:08')
...     ]
... )
>>> print(a.rolling('2s').max())
2019-01-01T09:00:00.000    1
2019-01-01T09:00:01.000    9
2019-01-01T09:00:02.000    9
2019-01-01T09:00:04.000    4
2019-01-01T09:00:07.000
2019-01-01T09:00:08.000    1
dtype: int64

Apply custom function on the window with the apply method

>>> import numpy as np
>>> import math
>>> b = cudf.Series([16, 25, 36, 49, 64, 81], dtype=np.float64)
>>> def some_func(A):
...     b = 0
...     for a in A:
...         b = b + math.sqrt(a)
...     return b
...
>>> print(b.rolling(3, min_periods=1).apply(some_func))
0     4.0
1     9.0
2    15.0
3    18.0
4    21.0
5    24.0
dtype: float64

And this also works for window rolling set by an offset

>>> import pandas as pd
>>> c = cudf.Series(
...     [16, 25, 36, 49, 64, 81],
...     index=[
...          pd.Timestamp('20190101 09:00:00'),
...          pd.Timestamp('20190101 09:00:01'),
...          pd.Timestamp('20190101 09:00:02'),
...          pd.Timestamp('20190101 09:00:04'),
...          pd.Timestamp('20190101 09:00:07'),
...          pd.Timestamp('20190101 09:00:08')
...      ],
...     dtype=np.float64
... )
>>> print(c.rolling('2s').apply(some_func))
2019-01-01T09:00:00.000     4.0
2019-01-01T09:00:01.000     9.0
2019-01-01T09:00:02.000    11.0
2019-01-01T09:00:04.000     7.0
2019-01-01T09:00:07.000     8.0
2019-01-01T09:00:08.000    17.0
dtype: float64
round(decimals=0)

Round each value in a Series to the given number of decimals.

Parameters
decimalsint, default 0

Number of decimal places to round to. If decimals is negative, it specifies the number of positions to the left of the decimal point.

Returns
Series

Rounded values of the Series.

Examples

>>> s = cudf.Series([0.1, 1.4, 2.9])
>>> s.round()
0    0.0
1    1.0
2    3.0
dtype: float64
rpow(other, fill_value=None, axis=0)

Exponential power of series and other, element-wise (binary operator rpow).

Parameters
otherSeries or scalar value
fill_valueNone or value

Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null

Returns
Series

The result of the operation.

Examples

>>> import cudf
>>> a = cudf.Series([1, 2, 3, None], index=['a', 'b', 'c', 'd'])
>>> a
a       1
b       2
c       3
d    <NA>
dtype: int64
>>> b = cudf.Series([10, None, 12, None], index=['a', 'b', 'd', 'e'])
>>> b
a      10
b    <NA>
d      12
e    <NA>
dtype: int64
>>> a.rpow(b, fill_value=0)
a      10
b       0
c       0
d       1
e    <NA>
dtype: int64
rsub(other, fill_value=None, axis=0)

Subtraction of series and other, element-wise (binary operator rsub).

Parameters
otherSeries or scalar value
fill_valueNone or value

Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null

Returns
Series

The result of the operation.

Examples

>>> import cudf
>>> a = cudf.Series([1, 2, 3, None], index=['a', 'b', 'c', 'd'])
>>> a
a       1
b       2
c       3
d    <NA>
dtype: int64
>>> b = cudf.Series([1, None, 2, None], index=['a', 'b', 'd', 'e'])
>>> b
a       1
b    <NA>
d       2
e    <NA>
dtype: int64
>>> a.rsub(b, fill_value=10)
a       0
b       8
c       7
d      -8
e    <NA>
dtype: int64
rtruediv(other, fill_value=None, axis=0)

Floating division of series and other, element-wise (binary operator rtruediv).

Parameters
otherSeries or scalar value
fill_valueNone or value

Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null

Returns
Series

The result of the operation.

Examples

>>> import cudf
>>> a = cudf.Series([10, 20, None, 30], index=['a', 'b', 'c', 'd'])
>>> a
a      10
b      20
c    <NA>
d      30
dtype: int64
>>> b = cudf.Series([1, None, 2, 3], index=['a', 'b', 'd', 'e'])
>>> b
a       1
b    <NA>
d       2
e       3
dtype: int64
>>> a.rtruediv(b, fill_value=0)
a            0.1
b            0.0
c           <NA>
d    0.066666667
e            Inf
dtype: float64
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters
nint, optional

Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

fracfloat, optional

Fraction of axis items to return. Cannot be used with n.

replacebool, default False

Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”

weightsstr or ndarray-like, optional

Only supported for axis=1/”columns”

random_stateint, numpy RandomState or None, default None

Seed for the random number generator (if int), or None. If None, a random seed will be chosen. if RandomState, seed will be extracted from current state.

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.

Returns
Series or DataFrame or Index

A new object of same type as caller containing n items randomly sampled from the caller object.

Examples

>>> import cudf as cudf
>>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}})
>>> df.sample(3)
   a
1  2
3  4
0  1
>>> sr = cudf.Series([1, 2, 3, 4, 5])
>>> sr.sample(10, replace=True)
1    4
3    1
2    4
0    5
0    1
4    5
4    1
0    2
0    3
3    2
dtype: int64
>>> df = cudf.DataFrame(
... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]})
>>> df.sample(2, axis=1)
   a  c
0  1  3
1  2  4
scale()

Scale values to [0, 1] in float64

Returns
Series

A new series with values scaled to [0, 1].

Examples

>>> import cudf
>>> series = cudf.Series([10, 11, 12, 0.5, 1])
>>> series
0    10.0
1    11.0
2    12.0
3     0.5
4     1.0
dtype: float64
>>> series.scale()
0    0.826087
1    0.913043
2    1.000000
3    0.000000
4    0.043478
dtype: float64
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)

Scatter to a list of dataframes.

Uses map_index to determine the destination of each row of the original DataFrame.

Parameters
map_indexSeries, str or list-like

Scatter assignment for each row

map_sizeint

Length of output list. Must be >= uniques in map_index

keep_indexbool

Conserve original index values for each row

Returns
A list of cudf.DataFrame objects.
searchsorted(values, side='left', ascending=True, na_position='last')

Find indices where elements should be inserted to maintain order

Parameters
valueFrame (Shape must be consistent with self)

Values to be hypothetically inserted into Self

sidestr {‘left’, ‘right’} optional, default ‘left‘

If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index

ascendingbool optional, default True

Sorted Frame is in ascending order (otherwise descending)

na_positionstr {‘last’, ‘first’} optional, default ‘last‘

Position of null values in sorted order

Returns
1-D cupy array of insertion points

Examples

>>> s = cudf.Series([1, 2, 3])
>>> s.searchsorted(4)
3
>>> s.searchsorted([0, 4])
array([0, 3], dtype=int32)
>>> s.searchsorted([1, 3], side='left')
array([0, 2], dtype=int32)
>>> s.searchsorted([1, 3], side='right')
array([1, 3], dtype=int32)

If the values are not monotonically sorted, wrong locations may be returned:

>>> s = cudf.Series([2, 1, 3])
>>> s.searchsorted(1)
0   # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]})
>>> df
   a   b
0  1  10
1  3  12
2  5  14
3  7  16
>>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6],
... 'b': [10, 11, 13, 15]})
>>> values_df
   a   b
0  0  10
1  2  17
2  5  13
3  6  15
>>> df.searchsorted(values_df, ascending=False)
array([4, 4, 4, 0], dtype=int32)
set_index(index)

Returns a new Series with a different index.

Parameters
indexIndex, Series-convertible

the new index or values for the new index

Returns
Series

A new Series with assigned index.

Examples

>>> import cudf
>>> series = cudf.Series([10, 11, 12, 13, 14])
>>> series
0    10
1    11
2    12
3    13
4    14
dtype: int64
>>> series.set_index(['a', 'b', 'c', 'd', 'e'])
a    10
b    11
c    12
d    13
e    14
dtype: int64
set_mask(mask, null_count=None)

Create new Series by setting a mask array.

This will override the existing mask. The returned Series will reference the same data buffer as this Series.

Parameters
mask1D array-like

The null-mask. Valid values are marked as 1; otherwise 0. The mask bit given the data index idx is computed as:

(mask[idx // 8] >> (idx % 8)) & 1
null_countint, optional

The number of null values. If None, it is calculated automatically.

Returns
Series

A new series with the applied mask.

Examples

>>> import cudf
>>> series = cudf.Series([1, 2, 3, 4, 5])
>>> ref_array = cudf.Series([10, None, 11, None, 16])
>>> series
0    1
1    2
2    3
3    4
4    5
dtype: int64
>>> ref_array
0      10
1    <NA>
2      11
3    <NA>
4      16
dtype: int64
>>> series.set_mask(ref_array._column.mask)
0       1
1    <NA>
2       3
3    <NA>
4       5
dtype: int64
property shape

Returns a tuple representing the dimensionality of the Series.

shift(periods=1, freq=None, axis=0, fill_value=None)

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.sin()
0    0.000000
1    0.318683
2    0.479426
3    0.850904
4    0.893997
5   -0.801153
6    0.958916
dtype: float64

sin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.sin()
      first    second
0  0.000000 -0.506366
1 -0.958924  0.958916
2 -0.544021 -0.544072
3  0.650288 -0.999756

sin operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.sin()
Float64Index([-0.3894183423086505, -0.5063656411097588,
            0.8011526357338306, 0.8939966636005579],
            dtype='float64')
property size

Return the number of elements in the underlying data.

Returns
sizeSize of the DataFrame / Index / Series / MultiIndex

Examples

Size of an empty dataframe is 0.

>>> import cudf
>>> df = cudf.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>> df.size
0
>>> df = cudf.DataFrame(index=[1, 2, 3])
>>> df
Empty DataFrame
Columns: []
Index: [1, 2, 3]
>>> df.size
0

DataFrame with values

>>> df = cudf.DataFrame({'a': [10, 11, 12],
...         'b': ['hello', 'rapids', 'ai']})
>>> df
    a       b
0  10   hello
1  11  rapids
2  12      ai
>>> df.size
6
>>> df.index
RangeIndex(start=0, stop=3)
>>> df.index.size
3

Size of an Index

>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.size
0
>>> index = cudf.Index([1, 2, 3, 10])
>>> index
Int64Index([1, 2, 3, 10], dtype='int64')
>>> index.size
4

Size of a MultiIndex

>>> midx = cudf.MultiIndex(
...                 levels=[["a", "b", "c", None], ["1", None, "5"]],
...                 codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...                 names=["x", "y"],
...             )
>>> midx
MultiIndex([( 'a',  '1'),
            ( 'a',  '5'),
            ( 'b', <NA>),
            ( 'c', <NA>),
            (<NA>,  '1')],
           names=['x', 'y'])
>>> midx.size
5
skew(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased Fisher-Pearson skew of a sample.

Parameters
skipnabool, default True

Exclude NA/null values when computing the result.

Returns
scalar

Notes

Parameters currently not supported are axis, level and numeric_only

Examples

>>> import cudf
>>> series = cudf.Series([1, 2, 3, 4, 5, 6, 6])
>>> series
0    1
1    2
2    3
3    4
4    5
5    6
6    6
dtype: int64
>>> series.skew()
-0.288195490292614
sort_index(ascending=True)

Sort by the index.

Parameters
ascendingbool, default True

Sort ascending vs. descending.

Returns
Series

The original Series sorted by the labels.

Examples

>>> import cudf
>>> series = cudf.Series(['a', 'b', 'c', 'd'], index=[3, 2, 1, 4])
>>> series
3    a
2    b
1    c
4    d
dtype: object
>>> series.sort_index()
1    c
2    b
3    a
4    d
dtype: object

Sort Descending

>>> series.sort_index(ascending=False)
4    d
3    a
2    b
1    c
dtype: object
sort_values(axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False)

Sort by the values.

Sort a Series in ascending or descending order by some criterion.

Parameters
ascendingbool, default True

If True, sort values in ascending order, otherwise descending.

na_position{‘first’, ‘last’}, default ‘last’

‘first’ puts nulls at the beginning, ‘last’ puts nulls at the end.

ignore_indexbool, default False

If True, index will not be sorted.

Returns
sorted_objcuDF Series

Notes

Difference from pandas:
  • Not supporting: inplace, kind

Examples

>>> import cudf
>>> s = cudf.Series([1, 5, 2, 4, 3])
>>> s.sort_values()
0    1
2    2
4    3
3    4
1    5
dtype: int64
sqrt()

Get the non-negative square-root of all elements, element-wise.

Returns
DataFrame/Series/Index

Result of the non-negative square-root of each element.

Examples

>>> import cudf
>>> import cudf
>>> ser = cudf.Series([10, 25, 81, 1.0, 100])
>>> ser
0     10.0
1     25.0
2     81.0
3      1.0
4    100.0
dtype: float64
>>> ser.sqrt()
0     3.162278
1     5.000000
2     9.000000
3     1.000000
4    10.000000
dtype: float64

sqrt operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-10.0, 100, 625],
...                      'second': [1, 2, 0.4]})
>>> df
   first  second
0  -10.0     1.0
1  100.0     2.0
2  625.0     0.4
>>> df.sqrt()
   first    second
0    NaN  1.000000
1   10.0  1.414214
2   25.0  0.632456

sqrt operation on Index:

>>> index = cudf.Index([-10.0, 100, 625])
>>> index
Float64Index([-10.0, 100.0, 625.0], dtype='float64')
>>> index.sqrt()
Float64Index([nan, 10.0, 25.0], dtype='float64')
std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)

Return sample standard deviation of the Series.

Normalized by N-1 by default. This can be changed using the ddof argument

Parameters
skipnabool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

ddofint, default 1

Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

Returns
scalar

Notes

Parameters currently not supported are axis, level and numeric_only

Examples

>>> import cudf
>>> series = cudf.Series([10, 10, 20, 30, 40])
>>> series
0    10
1    10
2    20
3    30
4    40
dtype: int64
>>> series.std()
13.038404810405298
>>> series.std(ddof=2)
15.05545305418162
property str

Vectorized string functions for Series and Index.

This mimics pandas df.str interface. nulls stay null unless handled otherwise by a particular method. Patterned after Python’s string methods, with some inspiration from R’s stringr package.

sub(other, fill_value=None, axis=0)

Subtraction of series and other, element-wise (binary operator sub).

Parameters
otherSeries or scalar value
fill_valueNone or value

Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null

Returns
Series

The result of the operation.

Examples

>>> import cudf
>>> a = cudf.Series([10, 20, None, 30, None], index=['a', 'b', 'c', 'd', 'e'])
>>> a
a      10
b      20
c    <NA>
d      30
e    <NA>
dtype: int64
>>> b = cudf.Series([1, None, 2, 30], index=['a', 'c', 'b', 'd'])
>>> b
a       1
c    <NA>
b       2
d      30
dtype: int64
>>> a.subtract(b, fill_value=2)
a       9
b      18
c    <NA>
d       0
e    <NA>
dtype: int64
subtract(other, fill_value=None, axis=0)

Subtraction of series and other, element-wise (binary operator sub).

Parameters
otherSeries or scalar value
fill_valueNone or value

Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null

Returns
Series

The result of the operation.

Examples

>>> import cudf
>>> a = cudf.Series([10, 20, None, 30, None], index=['a', 'b', 'c', 'd', 'e'])
>>> a
a      10
b      20
c    <NA>
d      30
e    <NA>
dtype: int64
>>> b = cudf.Series([1, None, 2, 30], index=['a', 'c', 'b', 'd'])
>>> b
a       1
c    <NA>
b       2
d      30
dtype: int64
>>> a.subtract(b, fill_value=2)
a       9
b      18
c    <NA>
d       0
e    <NA>
dtype: int64
sum(axis=None, skipna=None, dtype=None, level=None, numeric_only=None, min_count=0, **kwargs)

Return sum of the values in the Series.

Parameters
skipnabool, default True

Exclude NA/null values when computing the result.

dtypedata type

Data type to cast the result to.

min_countint, default 0

The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

The default being 0. This means the sum of an all-NA or empty Series is 0, and the product of an all-NA or empty Series is 1.

Returns
scalar

Notes

Parameters currently not supported are axis, level, numeric_only.

Examples

>>> import cudf
>>> ser = cudf.Series([1, 5, 2, 4, 3])
>>> ser.sum()
15
tail(n=5)

Returns the last n rows as a new Series

Examples

>>> import cudf
>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.tail(2)
3    1
4    0
take(indices, keep_index=True)

Return Series by taking values from the corresponding indices.

Parameters
indicesarray-like or scalar

An array/scalar like integers indicating which positions to take.

keep_indexbool, default True

Whethere to retain the index in result Series or not.

Returns
Series

Examples

>>> import cudf
>>> series = cudf.Series([10, 11, 12, 13, 14])
>>> series
0    10
1    11
2    12
3    13
4    14
dtype: int64
>>> series.take([0, 4])
0    10
4    14
dtype: int64

If you want to drop the index, pass keep_index=False

>>> series.take([0, 4], keep_index=False)
0    10
1    14
dtype: int64
tan()

Get Trigonometric tangent, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.tan()
0    0.000000
1    0.336213
2    0.546302
3    1.619775
4   -1.995200
5    1.338690
6   -3.380140
dtype: float64

tan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.tan()
      first     second
0  0.000000  -0.587214
1 -3.380515  -3.380140
2  0.648361   0.648446
3 -0.855993  45.244742

tan operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.tan()
Float64Index([-0.4227932187381618,  -0.587213915156929,
            -1.3386902103511544, -1.995200412208242],
            dtype='float64')
tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

Parameters
selfinput Table containing columns to interleave.
countNumber of times to tile “rows”. Must be non-negative.
Returns
The table containing the tiled “rows”.

Examples

>>> df  = Dataframe([[8, 4, 7], [5, 2, 3]])
>>> count = 2
>>> df.tile(df, count)
   0  1  2
0  8  4  7
1  5  2  3
0  8  4  7
1  5  2  3
to_array(fillna=None)

Get a dense numpy array for the data.

Parameters
fillnastr or None

Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.

Returns
numpy.ndarray

A numpy array representation of the elements in the Series.

Notes

If fillna is None, null values are skipped. Therefore, the output size could be smaller.

Examples

>>> import cudf
>>> series = cudf.Series([10, 11, 12, 13, 14])
>>> series
0    10
1    11
2    12
3    13
4    14
dtype: int64
>>> array = series.to_array()
>>> array
array([10, 11, 12, 13, 14])
>>> type(array)
<class 'numpy.ndarray'>
to_arrow()

Convert Series to a PyArrow Array.

Returns
PyArrow Array

Examples

>>> import cudf
>>> sr = cudf.Series(["a", "b", None])
>>> sr.to_arrow()
<pyarrow.lib.StringArray object at 0x7f796b0e7600>
[
  "a",
  "b",
  null
]
to_dlpack()

Converts a cuDF object into a DLPack tensor.

DLPack is an open-source memory tensor structure: dmlc/dlpack.

This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.

Parameters
cudf_objDataFrame, Series, Index, or Column
Returns
pycapsule_objPyCapsule

Output DLPack tensor pointer which is encapsulated in a PyCapsule object.

to_frame(name=None)

Convert Series into a DataFrame

Parameters
namestr, default None

Name to be used for the column

Returns
DataFrame

cudf DataFrame

Examples

>>> import cudf
>>> series = cudf.Series(['a', 'b', 'c', None, 'd'], name='sample', index=[10, 11, 12, 13, 15])
>>> series
10       a
11       b
12       c
13    <NA>
15       d
Name: sample, dtype: object
>>> series.to_frame()
   sample
10      a
11      b
12      c
13   <NA>
15      d
to_gpu_array(fillna=None)

Get a dense numba device array for the data.

Parameters
fillnastr or None

See fillna in .to_array.

Returns
numba DeviceNDArray

Notes

if fillna is None, null values are skipped. Therefore, the output size could be smaller.

Examples

>>> import cudf
>>> s = cudf.Series([10, 20, 30, 40, 50])
>>> s
0    10
1    20
2    30
3    40
4    50
dtype: int64
>>> s.to_gpu_array()
<numba.cuda.cudadrv.devicearray.DeviceNDArray object at 0x7f1840858890>
to_hdf(path_or_buf, key, *args, **kwargs)

Write the contained data to an HDF5 file using HDFStore.

Hierarchical Data Format (HDF) is self-describing, allowing an application to interpret the structure and contents of a file with no outside information. One HDF file can hold a mix of related objects which can be accessed as a group or as individual objects.

In order to add another DataFrame or Series to an existing HDF file please use append mode and a different a key.

For more information see the user guide.

Parameters
path_or_bufstr or pandas.HDFStore

File path or HDFStore object.

keystr

Identifier for the group in the store.

mode{‘a’, ‘w’, ‘r+’}, default ‘a’

Mode to open file:

  • ‘w’: write, a new file is created (an existing file with the same name would be deleted).

  • ‘a’: append, an existing file is opened for reading and writing, and if the file does not exist it is created.

  • ‘r+’: similar to ‘a’, but the file must already exist.

format{‘fixed’, ‘table’}, default ‘fixed’

Possible values:

  • ‘fixed’: Fixed format. Fast writing/reading. Not-appendable, nor searchable.

  • ‘table’: Table format. Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data.

appendbool, default False

For Table formats, append the input data to the existing.

data_columnslist of columns or True, optional

List of columns to create as indexed data columns for on-disk queries, or True to use all columns. By default only the axes of the object are indexed. See Query via Data Columns. Applicable only to format=’table’.

complevel{0-9}, optional

Specifies a compression level for data. A value of 0 disables compression.

complib{‘zlib’, ‘lzo’, ‘bzip2’, ‘blosc’}, default ‘zlib’

Specifies the compression library to be used. As of v0.20.2 these additional compressors for Blosc are supported (default if no compressor specified: ‘blosc:blosclz’): {‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’}. Specifying a compression library which is not available issues a ValueError.

fletcher32bool, default False

If applying compression use the fletcher32 checksum.

dropnabool, default False

If true, ALL nan rows will not be written to store.

errorsstr, default ‘strict’

Specifies how encoding and decoding errors are to be handled. See the errors argument for open() for a full list of options.

See also

cudf.io.hdf.read_hdf

Read from HDF file.

cudf.io.parquet.to_parquet

Write a DataFrame to the binary parquet format.

cudf.io.feather.to_feather

Write out feather-format for DataFrames.

to_json(path_or_buf=None, *args, **kwargs)

Convert the cuDF object to a JSON string. Note nulls and NaNs will be converted to null and datetime objects will be converted to UNIX timestamps.

Parameters
path_or_bufstring or file handle, optional

File path or object. If not specified, the result is returned as a string.

orientstring

Indication of expected JSON string format.

  • Series
    • default is ‘index’

    • allowed values are: {‘split’,’records’,’index’,’table’}

  • DataFrame
    • default is ‘columns’

    • allowed values are: {‘split’,’records’,’index’,’columns’,’values’,’table’}

  • The format of the JSON string
    • ‘split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}

    • ‘records’ : list like [{column -> value}, … , {column -> value}]

    • ‘index’ : dict like {index -> {column -> value}}

    • ‘columns’ : dict like {column -> {index -> value}}

    • ‘values’ : just the values array

    • ‘table’ : dict like {‘schema’: {schema}, ‘data’: {data}} describing the data, and the data component is like orient='records'.

date_format{None, ‘epoch’, ‘iso’}

Type of date conversion. ‘epoch’ = epoch milliseconds, ‘iso’ = ISO8601. The default depends on the orient. For orient='table', the default is ‘iso’. For all other orients, the default is ‘epoch’.

double_precisionint, default 10

The number of decimal places to use when encoding floating point values.

force_asciibool, default True

Force encoded string to be ASCII.

date_unitstring, default ‘ms’ (milliseconds)

The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’, ‘ns’ for second, millisecond, microsecond, and nanosecond respectively.

default_handlercallable, default None

Handler to call if object cannot otherwise be converted to a suitable format for JSON. Should receive a single argument which is the object to convert and return a serializable object.

linesbool, default False

If ‘orient’ is ‘records’ write out line delimited json format. Will throw ValueError if incorrect ‘orient’ since others are not list like.

compression{‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}

A string representing the compression to use in the output file, only used when the first argument is a filename. By default, the compression is inferred from the filename.

indexbool, default True

Whether to include the index values in the JSON string. Not including the index (index=False) is only supported when orient is ‘split’ or ‘table’.

to_pandas(index=True, nullable=False, **kwargs)

Convert to a Pandas Series.

Parameters
indexBoolean, Default True

If index is True, converts the index of cudf.Series and sets it to the pandas.Series. If index is False, no index conversion is performed and pandas.Series will assign a default index.

nullableBoolean, Default False

If nullable is True, the resulting series will be having a corresponding nullable Pandas dtype. If nullable is False, the resulting series will either convert null values to np.nan or None depending on the dtype.

Returns
outPandas Series

Examples

>>> import cudf
>>> ser = cudf.Series([-3, 2, 0])
>>> pds = ser.to_pandas()
>>> pds
0   -3
1    2
2    0
dtype: int64
>>> type(pds)
<class 'pandas.core.series.Series'>

nullable parameter can be used to control whether dtype can be Pandas Nullable or not:

>>> ser = cudf.Series([10, 20, None, 30])
>>> ser
0      10
1      20
2    <NA>
3      30
dtype: int64
>>> ser.to_pandas(nullable=True)
0      10
1      20
2    <NA>
3      30
dtype: Int64
>>> ser.to_pandas(nullable=False)
0    10.0
1    20.0
2     NaN
3    30.0
dtype: float64
to_string()

Convert to string

Uses Pandas formatting internals to produce output identical to Pandas. Use the Pandas formatting settings directly in Pandas to control cuDF output.

Returns
str

String representation of Series

Examples

>>> import cudf
>>> series = cudf.Series(['a', None, 'b', 'c', None])
>>> series
0       a
1    <NA>
2       b
3       c
4    <NA>
dtype: object
>>> series.to_string()
'0       a\n1    <NA>\n2       b\n3       c\n4    <NA>\ndtype: object'
truediv(other, fill_value=None, axis=0)

Floating division of series and other, element-wise (binary operator truediv).

Parameters
otherSeries or scalar value
fill_valueNone or value

Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null

Returns
Series

The reuslt of the operation.

Examples

>>> import cudf
>>> a = cudf.Series([1, 10, 20, None], index=['a', 'b', 'c', 'd'])
>>> a
a       1
b      10
c      20
d    <NA>
dtype: int64
>>> b = cudf.Series([1, None, 2, None], index=['a', 'b', 'd', 'e'])
>>> b
a       1
b    <NA>
d       2
e    <NA>
dtype: int64
>>> a.truediv(b, fill_value=0)
a     1.0
b     Inf
c     Inf
d     0.0
e    <NA>
dtype: float64
unique()

Returns unique values of this Series.

Returns
Series

A series with only the unique values.

Examples

>>> import cudf
>>> series = cudf.Series(['a', 'a', 'b', None, 'b', None, 'c'])
>>> series
0       a
1       a
2       b
3    <NA>
4       b
5    <NA>
6       c
dtype: object
>>> series.unique()
0    <NA>
1       a
2       b
3       c
dtype: object
update(other)

Modify Series in place using values from passed Series. Uses non-NA values from passed Series to make updates. Aligns on index.

Parameters
otherSeries, or object coercible into Series

Examples

>>> import cudf
>>> s = cudf.Series([1, 2, 3])
>>> s
0    1
1    2
2    3
dtype: int64
>>> s.update(cudf.Series([4, 5, 6]))
>>> s
0    4
1    5
2    6
dtype: int64
>>> s = cudf.Series(['a', 'b', 'c'])
>>> s
0    a
1    b
2    c
dtype: object
>>> s.update(cudf.Series(['d', 'e'], index=[0, 2]))
>>> s
0    d
1    b
2    e
dtype: object
>>> s = cudf.Series([1, 2, 3])
>>> s
0    1
1    2
2    3
dtype: int64
>>> s.update(cudf.Series([4, 5, 6, 7, 8]))
>>> s
0    4
1    5
2    6
dtype: int64

If other contains NaNs the corresponding values are not updated in the original Series.

>>> s = cudf.Series([1, 2, 3])
>>> s
0    1
1    2
2    3
dtype: int64
>>> s.update(cudf.Series([4, np.nan, 6], nan_as_null=False))
>>> s
0    4
1    2
2    6
dtype: int64

other can also be a non-Series object type that is coercible into a Series

>>> s = cudf.Series([1, 2, 3])
>>> s
0    1
1    2
2    3
dtype: int64
>>> s.update([4, np.nan, 6])
>>> s
0    4
1    2
2    6
dtype: int64
>>> s = cudf.Series([1, 2, 3])
>>> s
0    1
1    2
2    3
dtype: int64
>>> s.update({1: 9})
>>> s
0    1
1    9
2    3
dtype: int64
property valid_count

Number of non-null values

value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)

Return a Series containing counts of unique values.

The resulting object will be in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default.

Parameters
normalizebool, default False

If True then the object returned will contain the relative frequencies of the unique values.

sortbool, default True

Sort by frequencies.

ascendingbool, default False

Sort in ascending order.

binsint, optional

Rather than count values, group them into half-open bins, works with numeric data. This Parameter is not yet supported.

dropnabool, default True

Don’t include counts of NaN and None.

Returns
resultSeries contanining counts of unique values.

See also

Series.count

Number of non-NA elements in a Series.

cudf.core.dataframe.DataFrame.count

Number of non-NA elements in a DataFrame.

Examples

>>> import cudf
>>> sr = cudf.Series([1.0, 2.0, 2.0, 3.0, 3.0, 3.0, None])
>>> sr
0     1.0
1     2.0
2     2.0
3     3.0
4     3.0
5     3.0
6    <NA>
dtype: float64
>>> sr.value_counts()
3.0    3
2.0    2
1.0    1
dtype: int32

The order of the counts can be changed by passing ascending=True:

>>> sr.value_counts(ascending=True)
1.0    1
2.0    2
3.0    3
dtype: int32

With normalize set to True, returns the relative frequency by dividing all values by the sum of values.

>>> sr.value_counts(normalize=True)
3.0    0.500000
2.0    0.333333
1.0    0.166667
dtype: float64

To include NA value counts, pass dropna=False:

>>> sr = cudf.Series([1.0, 2.0, 2.0, 3.0, None, 3.0, 3.0, None])
>>> sr
0     1.0
1     2.0
2     2.0
3     3.0
4    <NA>
5     3.0
6     3.0
7    <NA>
dtype: float64
>>> sr.value_counts(dropna=False)
3.0     3
2.0     2
<NA>    2
1.0     1
dtype: int32
property values

Return a CuPy representation of the Series.

Only the values in the Series will be returned.

Returns
outcupy.ndarray

The values of the Series.

Examples

>>> import cudf
>>> ser = cudf.Series([1, -10, 100, 20])
>>> ser.values
array([  1, -10, 100,  20])
>>> type(ser.values)
<class 'cupy.core.core.ndarray'>
property values_host

Return a numpy representation of the Series.

Only the values in the Series will be returned.

Returns
outnumpy.ndarray

The values of the Series.

Examples

>>> import cudf
>>> ser = cudf.Series([1, -10, 100, 20])
>>> ser.values_host
array([  1, -10, 100,  20])
>>> type(ser.values_host)
<class 'numpy.ndarray'>
var(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)

Return unbiased variance of the Series.

Normalized by N-1 by default. This can be changed using the ddof argument

Parameters
skipnabool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

ddofint, default 1

Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

Returns
scalar

Notes

Parameters currently not supported are axis, level and numeric_only

Examples

>>> import cudf
>>> series = cudf.Series([10, 11, 12, 0, 1])
>>> series
0    10
1    11
2    12
3     0
4     1
dtype: int64
>>> series.var()
33.7
where(cond, other=None, inplace=False)

Replace values where the condition is False.

Parameters
condbool Series/DataFrame, array-like

Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.

other: scalar, list of scalars, Series/DataFrame

Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.

DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.

Series expects only scalar or series like with same length

inplacebool, default False

Whether to perform the operation in place on the data.

Returns
Same type as caller

Examples

>>> import cudf
>>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]})
>>> df.where(df % 2 == 0, [-1, -1])
   A  B
0 -1 -1
1  4 -1
2 -1  8
>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.where(ser > 2, 10)
0     4
1     3
2    10
3    10
4    10
dtype: int64
>>> ser.where(ser > 2)
0       4
1       3
2    <NA>
3    <NA>
4    <NA>
dtype: int64

Lists

class cudf.core.column.lists.ListMethods(column, parent=None)

List methods for Series

Attributes
leaves

From a Series of (possibly nested) lists, obtain the elements from the innermost lists as a flat Series (one value per row).

Methods

contains(search_key)

Creates a column of bool values indicating whether the specified scalar is an element of each row of a list column.

get(index)

Extract element at the given index from each component

len()

Computes the length of each element in the Series/Index.

sort_values([ascending, inplace, kind, …])

Sort each list by the values.

take(lists_indices)

Collect list elements based on given indices.

unique()

Returns unique element for each list in the column, order for each unique element is not guaranteed.

contains(search_key)

Creates a column of bool values indicating whether the specified scalar is an element of each row of a list column.

Parameters
search_keyscalar

element being searched for in each row of the list column

Returns
Column

Examples

>>> s = cudf.Series([[1, 2, 3], [3, 4, 5], [4, 5, 6]])
>>> s.list.contains(4)
Series([False, True, True])
dtype: bool
get(index)

Extract element at the given index from each component

Extract element from lists, tuples, or strings in each element in the Series/Index.

Parameters
indexint
Returns
Series or Index

Examples

>>> s = cudf.Series([[1, 2, 3], [3, 4, 5], [4, 5, 6]])
>>> s.list.get(-1)
0    3
1    5
2    6
dtype: int64
property leaves

From a Series of (possibly nested) lists, obtain the elements from the innermost lists as a flat Series (one value per row).

Returns
Series

Examples

>>> a = cudf.Series([[[1, None], [3, 4]], None, [[5, 6]]])
>>> a.list.leaves
0       1
1    <NA>
2       3
3       4
4       5
5       6
dtype: int64
len()

Computes the length of each element in the Series/Index.

Returns
Series or Index

Examples

>>> s = cudf.Series([[1, 2, 3], None, [4, 5]])
>>> s
0    [1, 2, 3]
1         None
2       [4, 5]
dtype: list
>>> s.list.len()
0       3
1    <NA>
2       2
dtype: int32
sort_values(ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False)

Sort each list by the values.

Sort the lists in ascending or descending order by some criterion.

Parameters
ascendingbool, default True

If True, sort values in ascending order, otherwise descending.

na_position{‘first’, ‘last’}, default ‘last’

‘first’ puts nulls at the beginning, ‘last’ puts nulls at the end.

ignore_indexbool, default False

If True, the resulting axis will be labeled 0, 1, …, n - 1.

Returns
ListColumn with each list sorted

Notes

Difference from pandas:
  • Not supporting: inplace, kind

Examples

>>> s = cudf.Series([[4, 2, None, 9], [8, 8, 2], [2, 1]])
>>> s.list.sort_values(ascending=True, na_position="last")
0    [2.0, 4.0, 9.0, nan]
1         [2.0, 8.0, 8.0]
2              [1.0, 2.0]
dtype: list
take(lists_indices)

Collect list elements based on given indices.

Parameters
lists_indices: List type arrays

Specifies what to collect from each row

Returns
ListColumn

Examples

>>> s = cudf.Series([[1, 2, 3], None, [4, 5]])
>>> s
0    [1, 2, 3]
1         None
2       [4, 5]
dtype: list
>>> s.list.take([[0, 1], [], []])
0    [1, 2]
1      None
2        []
dtype: list
unique()

Returns unique element for each list in the column, order for each unique element is not guaranteed.

Returns
ListColumn

Examples

>>> s = cudf.Series([[1, 1, 2, None, None], None, [4, 4], []])
>>> s
0    [1.0, 1.0, 2.0, nan, nan]
1                         None
2                   [4.0, 4.0]
3                           []
dtype: list
>>> s.list.unique() # Order of list element is not guaranteed
0              [1.0, 2.0, nan]
1                         None
2                        [4.0]
3                           []
dtype: list

Strings

class cudf.core.column.string.StringMethods(column, parent=None)

Vectorized string functions for Series and Index.

This mimics pandas df.str interface. nulls stay null unless handled otherwise by a particular method. Patterned after Python’s string methods, with some inspiration from R’s stringr package.

Methods

byte_count()

Computes the number of bytes of each string in the Series/Index.

capitalize()

Convert strings in the Series/Index to be capitalized.

cat()

Concatenate strings in the Series/Index with given separator.

center(width[, fillchar])

Filling left and right side of strings in the Series/Index with an additional character.

character_ngrams([n])

Generate the n-grams from characters in a column of strings.

character_tokenize()

Each string is split into individual characters.

code_points()

Returns an array by filling it with the UTF-8 code point values for each character of each string.

contains(pat[, case, flags, na, regex])

Test if pattern or regex is contained within a string of a Series or Index.

count(pat[, flags])

Count occurrences of pattern in each string of the Series/Index.

detokenize(indices[, separator])

Combines tokens into strings by concatenating them in the order in which they appear in the indices column.

edit_distance(targets)

The targets strings are measured against the strings in this instance using the Levenshtein edit distance algorithm.

endswith(pat)

Test if the end of each string element matches a pattern.

extract(pat[, flags, expand])

Extract capture groups in the regex pat as columns in a DataFrame.

filter_alphanum([repl, keep])

Remove non-alphanumeric characters from strings in this column.

filter_characters(table[, keep, repl])

Remove characters from each string using the character ranges in the given mapping table.

filter_tokens(min_token_length[, …])

Remove tokens from within each string in the series that are smaller than min_token_length and optionally replace them with the replacement string.

find(sub[, start, end])

Return lowest indexes in each strings in the Series/Index where the substring is fully contained between [start:end].

findall(pat[, flags, expand])

Find all occurrences of pattern or regular expression in the Series/Index.

get([i])

Extract element from each component at specified position.

htoi()

Returns integer value represented by each hex string.

index(sub[, start, end])

Return lowest indexes in each strings where the substring is fully contained between [start:end].

insert([start, repl])

Insert the specified string into each string in the specified position.

ip2int()

This converts ip strings to integers

is_consonant(position)

Return true for strings where the character at position is a consonant.

is_vowel(position)

Return true for strings where the character at position is a vowel – not a consonant.

isalnum()

Check whether all characters in each string are alphanumeric.

isalpha()

Check whether all characters in each string are alphabetic.

isdecimal()

Check whether all characters in each string are decimal.

isdigit()

Check whether all characters in each string are digits.

isempty()

Check whether each string is an empty string.

isfloat()

Check whether all characters in each string form floating value.

ishex()

Check whether all characters in each string form a hex integer.

isinteger()

Check whether all characters in each string form integer.

isipv4()

Check whether all characters in each string form an IPv4 address.

islower()

Check whether all characters in each string are lowercase.

isnumeric()

Check whether all characters in each string are numeric.

isspace()

Check whether all characters in each string are whitespace.

istimestamp(format)

Check whether all characters in each string can be converted to a timestamp using the given format.

isupper()

Check whether all characters in each string are uppercase.

join(sep)

Join lists contained as elements in the Series/Index with passed delimiter.

len()

Computes the length of each element in the Series/Index.

ljust(width[, fillchar])

Filling right side of strings in the Series/Index with an additional character.

lower()

Converts all characters to lowercase.

lstrip([to_strip])

Remove leading and trailing characters.

match(pat[, case, flags])

Determine if each string matches a regular expression.

ngrams([n, separator])

Generate the n-grams from a set of tokens, each record in series is treated a token.

ngrams_tokenize([n, delimiter, separator])

Generate the n-grams using tokens from each string.

normalize_characters([do_lower])

Normalizes strings characters for tokenizing.

normalize_spaces()

Remove extra whitespace between tokens and trim whitespace from the beginning and the end of each string.

pad(width[, side, fillchar])

Pad strings in the Series/Index up to width.

partition([sep, expand])

Split the string at the first occurrence of sep.

porter_stemmer_measure()

Compute the Porter Stemmer measure for each string.

replace(pat, repl[, n, case, flags, regex])

Replace occurrences of pattern/regex in the Series/Index with some other string.

replace_tokens(targets, replacements[, …])

The targets tokens are searched for within each string in the series and replaced with the corresponding replacements if found.

replace_with_backrefs(pat, repl)

Use the repl back-ref template to create a new string with the extracted elements found using the pat expression.

rfind(sub[, start, end])

Return highest indexes in each strings in the Series/Index where the substring is fully contained between [start:end].

rindex(sub[, start, end])

Return highest indexes in each strings where the substring is fully contained between [start:end].

rjust(width[, fillchar])

Filling left side of strings in the Series/Index with an additional character.

rpartition([sep, expand])

Split the string at the last occurrence of sep.

rsplit([pat, n, expand])

Split strings around given separator/delimiter.

rstrip([to_strip])

Remove leading and trailing characters.

slice([start, stop, step])

Slice substrings from each element in the Series or Index.

slice_from(starts, stops)

Return substring of each string using positions for each string.

slice_replace([start, stop, repl])

Replace the specified section of each string with a new string.

split([pat, n, expand])

Split strings around given separator/delimiter.

startswith(pat)

Test if the start of each string element matches a pattern.

strip([to_strip])

Remove leading and trailing characters.

subword_tokenize(hash_file[, max_length, …])

Run CUDA BERT subword tokenizer on cuDF strings column.

swapcase()

Change each lowercase character to uppercase and vice versa.

title()

Uppercase the first letter of each letter after a space and lowercase the rest.

token_count([delimiter])

Each string is split into tokens using the provided delimiter.

tokenize([delimiter])

Each string is split into tokens using the provided delimiter(s).

translate(table)

Map all characters in the string through the given mapping table.

upper()

Convert each string to uppercase.

url_decode()

Returns a URL-decoded format of each string.

url_encode()

Returns a URL-encoded format of each string.

wrap(width, **kwargs)

Wrap long strings in the Series/Index to be formatted in paragraphs with length less than a given width.

zfill(width)

Pad strings in the Series/Index by prepending ‘0’ characters.

byte_count()Union[cudf.Series, cudf.Index]

Computes the number of bytes of each string in the Series/Index.

ReturnsSeries or Index of int

A Series or Index of integer values indicating the number of bytes of each strings in the Series or Index.

Examples

>>> import cudf
>>> s = cudf.Series(["abc","d","ef"])
>>> s.str.byte_count()
0    3
1    1
2    2
dtype: int32
>>> s = cudf.Series(["Hello", "Bye", "Thanks 😊"])
>>> s.str.byte_count()
0     5
1     3
2    11
dtype: int32
capitalize()Union[cudf.Series, cudf.Index]

Convert strings in the Series/Index to be capitalized. This only applies to ASCII characters at this time.

Returns
Series or Index of object

Examples

>>> import cudf
>>> data = ['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe']
>>> s = cudf.Series(data)
>>> s.str.capitalize()
0                 Lower
1              Capitals
2    This is a sentence
3              Swapcase
dtype: object
>>> s = cudf.Series(["hello, friend","goodbye, friend"])
>>> s.str.capitalize()
0      Hello, friend
1    Goodbye, friend
dtype: object
cat(sep: str = None, na_rep: str = None)str
cat(others, sep: str = None, na_rep: str = None)Union[cudf.Series, cudf.Index, cudf.core.column.string.StringColumn]

Concatenate strings in the Series/Index with given separator.

If others is specified, this function concatenates the Series/Index and elements of others element-wise. If others is not passed, then all values in the Series/Index are concatenated into a single string with a given sep.

Parameters
othersSeries or List of str

Strings to be appended. The number of strings must match size() of this instance. This must be either a Series of string dtype or a Python list of strings.

sepstr

If specified, this separator will be appended to each string before appending the others.

na_repstr

This character will take the place of any null strings (not empty strings) in either list.

  • If na_rep is None, and others is None, missing values in the Series/Index are omitted from the result.

  • If na_rep is None, and others is not None, a row containing a missing value in any of the columns (before concatenation) will have a missing value in the result.

Returns
concatstr or Series/Index of str dtype

If others is None, str is returned, otherwise a Series/Index (same type as caller) of str dtype is returned.

Examples

>>> import cudf
>>> s = cudf.Series(['a', 'b', None, 'd'])
>>> s.str.cat(sep=' ')
'a b d'

By default, NA values in the Series are ignored. Using na_rep, they can be given a representation:

>>> s.str.cat(sep=' ', na_rep='?')
'a b ? d'

If others is specified, corresponding values are concatenated with the separator. Result will be a Series of strings.

>>> s.str.cat(['A', 'B', 'C', 'D'], sep=',')
0     a,A
1     b,B
2    <NA>
3     d,D
dtype: object

Missing values will remain missing in the result, but can again be represented using na_rep

>>> s.str.cat(['A', 'B', 'C', 'D'], sep=',', na_rep='-')
0    a,A
1    b,B
2    -,C
3    d,D
dtype: object

If sep is not specified, the values are concatenated without separation.

>>> s.str.cat(['A', 'B', 'C', 'D'], na_rep='-')
0    aA
1    bB
2    -C
3    dD
dtype: object
center(width: int, fillchar: str = ' ')Union[cudf.Series, cudf.Index]

Filling left and right side of strings in the Series/Index with an additional character.

Parameters
widthint

Minimum width of resulting string; additional characters will be filled with fillchar.

fillcharstr, default is ‘ ‘ (whitespace)

Additional character for filling.

Returns
Series/Index of str dtype

Returns Series or Index.

Examples

>>> import cudf
>>> s = cudf.Series(['a', 'b', None, 'd'])
>>> s.str.center(1)
0       a
1       b
2    <NA>
3       d
dtype: object
>>> s.str.center(1, fillchar='-')
0       a
1       b
2    <NA>
3       d
dtype: object
>>> s.str.center(2, fillchar='-')
0      a-
1      b-
2    <NA>
3      d-
dtype: object
>>> s.str.center(5, fillchar='-')
0    --a--
1    --b--
2     <NA>
3    --d--
dtype: object
>>> s.str.center(6, fillchar='-')
0    --a---
1    --b---
2      <NA>
3    --d---
dtype: object
character_ngrams(n: int = 2)Union[cudf.Series, cudf.Index]

Generate the n-grams from characters in a column of strings.

Parameters
nint

The degree of the n-gram (number of consecutive characters). Default of 2 for bigrams.

Examples

>>> import cudf
>>> str_series = cudf.Series(['abcd','efgh','xyz'])
>>> str_series.str.character_ngrams(2)
0    ab
1    bc
2    cd
3    ef
4    fg
5    gh
6    xy
7    yz
dtype: object
>>> str_series.str.character_ngrams(3)
0    abc
1    bcd
2    efg
3    fgh
4    xyz
dtype: object
character_tokenize()Union[cudf.Series, cudf.Index]

Each string is split into individual characters. The sequence returned contains each character as an individual string.

Returns
Series or Index of object.

Examples

>>> import cudf
>>> data = ["hello world", None, "goodbye, thank you."]
>>> ser = cudf.Series(data)
>>> ser.str.character_tokenize()
0     h
1     e
2     l
3     l
4     o
5
6     w
7     o
8     r
9     l
10    d
11    g
12    o
13    o
14    d
15    b
16    y
17    e
18    ,
19
20    t
21    h
22    a
23    n
24    k
25
26    y
27    o
28    u
29    .
dtype: object
code_points()Union[cudf.Series, cudf.Index]

Returns an array by filling it with the UTF-8 code point values for each character of each string. This function uses the len() method to determine the size of each sub-array of integers.

Returns
Series or Index.

Examples

>>> import cudf
>>> s = cudf.Series(["a","xyz", "éee"])
>>> s.str.code_points()
0       97
1      120
2      121
3      122
4    50089
5      101
6      101
dtype: int32
>>> s = cudf.Series(["abc"])
>>> s.str.code_points()
0    97
1    98
2    99
dtype: int32
contains(pat: Union[str, Sequence], case: bool = True, flags: int = 0, na=nan, regex: bool = True)Union[cudf.Series, cudf.Index]

Test if pattern or regex is contained within a string of a Series or Index.

Return boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.

Parameters
patstr or list-like

Character sequence or regular expression. If pat is list-like then regular expressions are not accepted.

regexbool, default True

If True, assumes the pattern is a regular expression. If False, treats the pattern as a literal string.

Returns
Series/Index of bool dtype

A Series/Index of boolean dtype indicating whether the given pattern is contained within the string of each element of the Series/Index.

Notes

The parameters case, flags, and na are not yet supported and will raise a NotImplementedError if anything other than the default value is set.

Examples

>>> import cudf
>>> s1 = cudf.Series(['Mouse', 'dog', 'house and parrot', '23', None])
>>> s1
0               Mouse
1                 dog
2    house and parrot
3                  23
4                <NA>
dtype: object
>>> s1.str.contains('og', regex=False)
0    False
1     True
2    False
3    False
4     <NA>
dtype: bool

Returning an Index of booleans using only a literal pattern.

>>> data = ['Mouse', 'dog', 'house and parrot', '23.0', np.NaN]
>>> idx = cudf.Index(data)
>>> idx
StringIndex(['Mouse' 'dog' 'house and parrot' '23.0' None], dtype='object')
>>> idx.str.contains('23', regex=False)
GenericIndex([False, False, False, True, <NA>], dtype='bool')

Returning ‘house’ or ‘dog’ when either expression occurs in a string.

>>> s1.str.contains('house|dog', regex=True)
0    False
1     True
2     True
3    False
4     <NA>
dtype: bool

Returning any digit using regular expression.

>>> s1.str.contains('\d', regex=True)                               # noqa W605
0    False
1    False
2    False
3     True
4     <NA>
dtype: bool

Ensure pat is a not a literal pattern when regex is set to True. Note in the following example one might expect only s2[1] and s2[3] to return True. However, ‘.0’ as a regex matches any character followed by a 0.

>>> s2 = cudf.Series(['40', '40.0', '41', '41.0', '35'])
>>> s2.str.contains('.0', regex=True)
0     True
1     True
2    False
3     True
4    False
dtype: bool

The pat may also be a list of strings in which case the individual strings are searched in corresponding rows.

>>> s2 = cudf.Series(['house', 'dog', 'and', '', ''])
>>> s1.str.contains(s2)
0    False
1     True
2     True
3     True
4     <NA>
dtype: bool
count(pat: str, flags: int = 0)Union[cudf.Series, cudf.Index]

Count occurrences of pattern in each string of the Series/Index.

This function is used to count the number of times a particular regex pattern is repeated in each of the string elements of the Series.

Parameters
patstr

Valid regular expression.

Returns
Series or Index

Notes

  • flags parameter is currently not supported.

  • Some characters need to be escaped when passing in pat. eg. '$' has a special meaning in regex and must be escaped when finding this literal character.

Examples

>>> import cudf
>>> s = cudf.Series(['A', 'B', 'Aaba', 'Baca', None, 'CABA', 'cat'])
>>> s.str.count('a')
0       0
1       0
2       2
3       2
4    <NA>
5       0
6       1
dtype: int32

Escape '$' to find the literal dollar sign.

>>> s = cudf.Series(['$', 'B', 'Aab$', '$$ca', 'C$B$', 'cat'])
>>> s.str.count('\$')                                       # noqa W605
0    1
1    0
2    1
3    2
4    2
5    0
dtype: int32

This is also available on Index.

>>> index = cudf.core.index.StringIndex(['A', 'A', 'Aaba', 'cat'])
>>> index.str.count('a')
Int64Index([0, 0, 2, 1], dtype='int64')
detokenize(indices: cudf.Series, separator: str = ' ')Union[cudf.Series, cudf.Index]

Combines tokens into strings by concatenating them in the order in which they appear in the indices column. The separator is concatenated between each token.

Parameters
indicesSeries

Each value identifies the output row for the corresponding token.

separatorstr

The string concatenated between each token in an output row. Default is space.

Returns
Series or Index of object.

Examples

>>> import cudf
>>> strs = cudf.Series(["hello", "world", "one", "two", "three"])
>>> indices = cudf.Series([0, 0, 1, 1, 2])
>>> strs.str.detokenize(indices)
0    hello world
1        one two
2          three
dtype: object
edit_distance(targets)Union[cudf.Series, cudf.Index]

The targets strings are measured against the strings in this instance using the Levenshtein edit distance algorithm. https://www.cuelogic.com/blog/the-levenshtein-algorithm

The targets parameter may also be a single string in which case the edit distance is computed for all the strings against that single string.

Parameters
targetsarray-like, Sequence or Series or str

The string(s) to measure against each string.

Returns
Series or Index of int32.

Examples

>>> import cudf
>>> sr = cudf.Series(["puppy", "doggy", "kitty"])
>>> targets = cudf.Series(["pup", "dogie", "kitten"])
>>> sr.str.edit_distance(targets=targets)
0    2
1    2
2    2
dtype: int32
>>> sr.str.edit_distance("puppy")
0    0
1    4
2    4
dtype: int32
endswith(pat: str)Union[cudf.Series, cudf.Index]

Test if the end of each string element matches a pattern.

Parameters
patstr or list-like

If str is an str, evaluates whether each string of series ends with pat. If pat is a list-like, evaluates whether self[i] ends with pat[i]. Regular expressions are not accepted.

Returns
Series or Index of bool

A Series of booleans indicating whether the given pattern matches the end of each string element.

Notes

na parameter is not yet supported, as cudf uses native strings instead of Python objects.

Examples

>>> import cudf
>>> s = cudf.Series(['bat', 'bear', 'caT', None])
>>> s
0     bat
1    bear
2     caT
3    <NA>
dtype: object
>>> s.str.endswith('t')
0     True
1    False
2    False
3     <NA>
dtype: bool
extract(pat: str, flags: int = 0, expand: bool = True)Union[cudf.Series, cudf.Index]

Extract capture groups in the regex pat as columns in a DataFrame.

For each subject string in the Series, extract groups from the first match of regular expression pat.

Parameters
patstr

Regular expression pattern with capturing groups.

expandbool, default True

If True, return DataFrame with one column per capture group. If False, return a Series/Index if there is one capture group or DataFrame if there are multiple capture groups.

Returns
DataFrame or Series/Index

A DataFrame with one row for each subject string, and one column for each group. If expand=False and pat has only one capture group, then return a Series/Index.

Notes

The flags parameter is not yet supported and will raise a NotImplementedError if anything other than the default value is passed.

Examples

>>> import cudf
>>> s = cudf.Series(['a1', 'b2', 'c3'])
>>> s.str.extract(r'([ab])(\d)')                                # noqa W605
      0     1
0     a     1
1     b     2
2  <NA>  <NA>

A pattern with one group will return a DataFrame with one column if expand=True.

>>> s.str.extract(r'[ab](\d)', expand=True)                     # noqa W605
      0
0     1
1     2
2  <NA>

A pattern with one group will return a Series if expand=False.

>>> s.str.extract(r'[ab](\d)', expand=False)                    # noqa W605
0       1
1       2
2    <NA>
dtype: object
filter_alphanum(repl: Optional[str] = None, keep: bool = True)Union[cudf.Series, cudf.Index]

Remove non-alphanumeric characters from strings in this column.

Parameters
replstr

Optional string to use in place of removed characters.

keepbool

Set to False to remove all alphanumeric characters instead of keeping them.

Returns
Series/Index of str dtype

Strings with only alphanumeric characters.

Examples

>>> import cudf
>>> s = cudf.Series(["pears £12", "plums $34", "Temp 72℉", "100Kâ„§"])
>>> s.str.filter_alphanum(" ")
0    pears  12
1    plums  34
2     Temp 72
3        100K
dtype: object
filter_characters(table: dict, keep: bool = True, repl: Optional[str] = None)Union[cudf.Series, cudf.Index]

Remove characters from each string using the character ranges in the given mapping table.

Parameters
tabledict

This table is a range of Unicode ordinals to filter. The minimum value is the key and the maximum value is the value. You can use str.maketrans() as a helper function for making the filter table. Overlapping ranges will cause undefined results. Range values are inclusive.

keepboolean

If False, the character ranges in the table are removed. If True, the character ranges not in the table are removed. Default is True.

replstr

Optional replacement string to use in place of removed characters.

Returns
Series or Index.

Examples

>>> import cudf
>>> data = ['aeiou', 'AEIOU', '0123456789']
>>> s = cudf.Series(data)
>>> s.str.filter_characters({'a':'l', 'M':'Z', '4':'6'})
0    aei
1     OU
2    456
dtype: object
>>> s.str.filter_characters({'a':'l', 'M':'Z', '4':'6'}, False, "_")
0         ___ou
1         AEI__
2    0123___789
dtype: object
filter_tokens(min_token_length: int, replacement: Optional[str] = None, delimiter: Optional[str] = None)Union[cudf.Series, cudf.Index]

Remove tokens from within each string in the series that are smaller than min_token_length and optionally replace them with the replacement string. Tokens are identified by the delimiter character provided.

Parameters
min_token_length: int

Minimum number of characters for a token to be retained in the output string.

replacementstr

String used in place of removed tokens.

delimiterstr

The character(s) used to locate the tokens of each string. Default is whitespace.

Returns
Series or Index of object.

Examples

>>> import cudf
>>> sr = cudf.Series(["this is me", "theme music", ""])
>>> sr.str.filter_tokens(3, replacement="_")
0       this _ _
1    theme music
2
dtype: object
>>> sr = cudf.Series(["this;is;me", "theme;music", ""])
>>> sr.str.filter_tokens(5,None,";")
0             ;;
1    theme;music
2
dtype: object
find(sub: str, start: int = 0, end: Optional[int] = None)Union[cudf.Series, cudf.Index]

Return lowest indexes in each strings in the Series/Index where the substring is fully contained between [start:end]. Return -1 on failure.

Parameters
substr

Substring being searched.

startint

Left edge index.

endint

Right edge index.

Returns
Series or Index of int

Examples

>>> import cudf
>>> s = cudf.Series(['abc', 'a','b' ,'ddb'])
>>> s.str.find('b')
0    1
1   -1
2    0
3    2
dtype: int32

Parameters such as start and end can also be used.

>>> s.str.find('b', start=1, end=5)
0    1
1   -1
2   -1
3    2
dtype: int32
findall(pat: str, flags: int = 0, expand: bool = True)Union[cudf.Series, cudf.Index]

Find all occurrences of pattern or regular expression in the Series/Index.

Parameters
patstr

Pattern or regular expression.

Returns
DataFrame

All non-overlapping matches of pattern or regular expression in each string of this Series/Index.

Notes

flags parameter is currently not supported.

Examples

>>> import cudf
>>> s = cudf.Series(['Lion', 'Monkey', 'Rabbit'])

The search for the pattern ‘Monkey’ returns one match:

>>> s.str.findall('Monkey')
        0
0    <NA>
1  Monkey
2    <NA>

When the pattern matches more than one string in the Series, all matches are returned:

>>> s.str.findall('on')
      0
0    on
1    on
2  <NA>

Regular expressions are supported too. For instance, the search for all the strings ending with the word ‘on’ is shown next:

>>> s.str.findall('on$')
      0
0    on
1  <NA>
2  <NA>

If the pattern is found more than once in the same string, then multiple strings are returned as columns:

>>> s.str.findall('b')
      0     1
0  <NA>  <NA>
1  <NA>  <NA>
2     b     b
get(i: int = 0)Union[cudf.Series, cudf.Index]

Extract element from each component at specified position.

Parameters
iint

Position of element to extract.

Returns
Series/Index of str dtype

Examples

>>> import cudf
>>> s = cudf.Series(["hello world", "rapids", "cudf"])
>>> s
0    hello world
1         rapids
2           cudf
dtype: object
>>> s.str.get(10)
0    d
1
2
dtype: object
>>> s.str.get(1)
0    e
1    a
2    u
dtype: object

get also accepts negative index number.

>>> s.str.get(-1)
0    d
1    s
2    f
dtype: object
htoi()Union[cudf.Series, cudf.Index]

Returns integer value represented by each hex string. String is interpretted to have hex (base-16) characters.

Returns
Series/Index of str dtype

Examples

>>> import cudf
>>> s = cudf.Series(["1234", "ABCDEF", "1A2", "cafe"])
>>> s.str.htoi()
0        4660
1    11259375
2         418
3       51966
dtype: int64
index(sub: str, start: int = 0, end: Optional[int] = None)Union[cudf.Series, cudf.Index]

Return lowest indexes in each strings where the substring is fully contained between [start:end]. This is the same as str.find except instead of returning -1, it raises a ValueError when the substring is not found.

Parameters
substr

Substring being searched.

startint

Left edge index.

endint

Right edge index.

Returns
Series or Index of object

Examples

>>> import cudf
>>> s = cudf.Series(['abc', 'a','b' ,'ddb'])
>>> s.str.index('b')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: substring not found

Parameters such as start and end can also be used.

>>> s = cudf.Series(['abc', 'abb','ab' ,'ddb'])
>>> s.str.index('b', start=1, end=5)
0    1
1    1
2    1
3    2
dtype: int32
insert(start: int = 0, repl: Optional[str] = None)Union[cudf.Series, cudf.Index]

Insert the specified string into each string in the specified position.

Parameters
startint

Beginning position of the string to replace. Default is beginning of the each string. Specify -1 to insert at the end of each string.

replstr

String to insert into the specified position value.

Returns
Series/Index of str dtype

A new string series with the specified string inserted at the specified position.

Examples

>>> import cudf
>>> s = cudf.Series(["abcdefghij", "0123456789"])
>>> s.str.insert(2, '_')
0    ab_cdefghij
1    01_23456789
dtype: object

When no repl is passed, nothing is inserted.

>>> s.str.insert(2)
0    abcdefghij
1    0123456789
dtype: object

Negative values are also supported for start.

>>> s.str.insert(-1,'_')
0    abcdefghij_
1    0123456789_
dtype: object
ip2int()Union[cudf.Series, cudf.Index]

This converts ip strings to integers

Returns
Series/Index of str dtype

Examples

>>> import cudf
>>> s = cudf.Series(["12.168.1.1", "10.0.0.1"])
>>> s.str.ip2int()
0    212336897
1    167772161
dtype: int64

Returns 0’s if any string is not an IP.

>>> s = cudf.Series(["12.168.1.1", "10.0.0.1", "abc"])
>>> s.str.ip2int()
0    212336897
1    167772161
2            0
dtype: int64
is_consonant(position)Union[cudf.Series, cudf.Index]

Return true for strings where the character at position is a consonant. The position parameter may also be a list of integers to check different characters per string. If the position is larger than the string length, False is returned for that string.

Parameters
position: int or list-like

The character position to check within each string.

Returns
Series or Index of bool dtype.

Examples

>>> import cudf
>>> ser = cudf.Series(["toy", "trouble"])
>>> ser.str.is_consonant(1)
0    False
1     True
dtype: bool
>>> positions = cudf.Series([2, 3])
>>> ser.str.is_consonant(positions)
0     True
1    False
dtype: bool
is_vowel(position)Union[cudf.Series, cudf.Index]

Return true for strings where the character at position is a vowel – not a consonant. The position parameter may also be a list of integers to check different characters per string. If the position is larger than the string length, False is returned for that string.

Parameters
position: int or list-like

The character position to check within each string.

Returns
Series or Index of bool dtype.

Examples

>>> import cudf
>>> ser = cudf.Series(["toy", "trouble"])
>>> ser.str.is_vowel(1)
0     True
1    False
dtype: bool
>>> positions = cudf.Series([2, 3])
>>> ser.str.is_vowel(positions)
0    False
1     True
dtype: bool
isalnum()Union[cudf.Series, cudf.Index]

Check whether all characters in each string are alphanumeric.

This is equivalent to running the Python string method str.isalnum() for each element of the Series/Index. If a string has zero characters, False is returned for that check.

Equivalent to: isalpha() or isdigit() or isnumeric() or isdecimal()

ReturnsSeries or Index of bool

Series or Index of boolean values with the same length as the original Series/Index.

See also

isalpha

Check whether all characters are alphabetic.

isdecimal

Check whether all characters are decimal.

isdigit

Check whether all characters are digits.

isinteger

Check whether all characters are integer.

isnumeric

Check whether all characters are numeric.

isfloat

Check whether all characters are float.

islower

Check whether all characters are lowercase.

isspace

Check whether all characters are whitespace.

isupper

Check whether all characters are uppercase.

Examples

>>> import cudf
>>> s1 = cudf.Series(['one', 'one1', '1', ''])
>>> s1.str.isalnum()
0     True
1     True
2     True
3    False
dtype: bool

Note that checks against characters mixed with any additional punctuation or whitespace will evaluate to false for an alphanumeric check.

>>> s2 = cudf.Series(['A B', '1.5', '3,000'])
>>> s2.str.isalnum()
0    False
1    False
2    False
dtype: bool
isalpha()Union[cudf.Series, cudf.Index]

Check whether all characters in each string are alphabetic.

This is equivalent to running the Python string method str.isalpha() for each element of the Series/Index. If a string has zero characters, False is returned for that check.

ReturnsSeries or Index of bool

Series or Index of boolean values with the same length as the original Series/Index.

See also

isalnum

Check whether all characters are alphanumeric.

isdecimal

Check whether all characters are decimal.

isdigit

Check whether all characters are digits.

isinteger

Check whether all characters are integer.

isnumeric

Check whether all characters are numeric.

isfloat

Check whether all characters are float.

islower

Check whether all characters are lowercase.

isspace

Check whether all characters are whitespace.

isupper

Check whether all characters are uppercase.

Examples

>>> import cudf
>>> s1 = cudf.Series(['one', 'one1', '1', ''])
>>> s1.str.isalpha()
0     True
1    False
2    False
3    False
dtype: bool
isdecimal()Union[cudf.Series, cudf.Index]

Check whether all characters in each string are decimal.

This is equivalent to running the Python string method str.isdecimal() for each element of the Series/Index. If a string has zero characters, False is returned for that check.

ReturnsSeries or Index of bool

Series or Index of boolean values with the same length as the original Series/Index.

See also

isalnum

Check whether all characters are alphanumeric.

isalpha

Check whether all characters are alphabetic.

isdigit

Check whether all characters are digits.

isinteger

Check whether all characters are integer.

isnumeric

Check whether all characters are numeric.

isfloat

Check whether all characters are float.

islower

Check whether all characters are lowercase.

isspace

Check whether all characters are whitespace.

isupper

Check whether all characters are uppercase.

Examples

>>> import cudf
>>> s3 = cudf.Series(['23', '³', 'â…•', ''])

The s3.str.isdecimal method checks for characters used to form numbers in base 10.

>>> s3.str.isdecimal()
0     True
1    False
2    False
3    False
dtype: bool
isdigit()Union[cudf.Series, cudf.Index]

Check whether all characters in each string are digits.

This is equivalent to running the Python string method str.isdigit() for each element of the Series/Index. If a string has zero characters, False is returned for that check.

ReturnsSeries or Index of bool

Series or Index of boolean values with the same length as the original Series/Index.

See also

isalnum

Check whether all characters are alphanumeric.

isalpha

Check whether all characters are alphabetic.

isdecimal

Check whether all characters are decimal.

isinteger

Check whether all characters are integer.

isnumeric

Check whether all characters are numeric.

isfloat

Check whether all characters are float.

islower

Check whether all characters are lowercase.

isspace

Check whether all characters are whitespace.

isupper

Check whether all characters are uppercase.

Examples

>>> import cudf
>>> s = cudf.Series(['23', '³', 'â…•', ''])

The s.str.isdigit method is the same as s.str.isdecimal but also includes special digits, like superscripted and subscripted digits in unicode.

>>> s.str.isdigit()
0     True
1     True
2    False
3    False
dtype: bool
isempty()Union[cudf.Series, cudf.Index]

Check whether each string is an empty string.

ReturnsSeries or Index of bool

Series or Index of boolean values with the same length as the original Series/Index.

Examples

>>> import cudf
>>> s = cudf.Series(["1", "abc", "", " ", None])
>>> s.str.isempty()
0    False
1    False
2     True
3    False
4    False
dtype: bool
isfloat()Union[cudf.Series, cudf.Index]

Check whether all characters in each string form floating value.

If a string has zero characters, False is returned for that check.

ReturnsSeries or Index of bool

Series or Index of boolean values with the same length as the original Series/Index.

See also

isalnum

Check whether all characters are alphanumeric.

isalpha

Check whether all characters are alphabetic.

isdecimal

Check whether all characters are decimal.

isdigit

Check whether all characters are digits.

isinteger

Check whether all characters are integer.

isnumeric

Check whether all characters are numeric.

islower

Check whether all characters are lowercase.

isspace

Check whether all characters are whitespace.

isupper

Check whether all characters are uppercase.

Examples

>>> import cudf
>>> s = cudf.Series(["1.1", "0.123213", "+0.123", "-100.0001", "234",
... "3-"])
>>> s.str.isfloat()
0     True
1     True
2     True
3     True
4     True
5    False
dtype: bool
>>> s = cudf.Series(["this is plain text", "\t\n", "9.9", "9.9.9"])
>>> s.str.isfloat()
0    False
1    False
2     True
3    False
dtype: bool
ishex()Union[cudf.Series, cudf.Index]

Check whether all characters in each string form a hex integer.

If a string has zero characters, False is returned for that check.

ReturnsSeries or Index of bool

Series or Index of boolean values with the same length as the original Series/Index.

See also

isdecimal

Check whether all characters are decimal.

isdigit

Check whether all characters are digits.

isnumeric

Check whether all characters are numeric.

isfloat

Check whether all characters are float.

Examples

>>> import cudf
>>> s = cudf.Series(["", "123DEF", "0x2D3", "-15", "abc"])
>>> s.str.ishex()
0    False
1     True
2     True
3    False
4     True
dtype: bool
isinteger()Union[cudf.Series, cudf.Index]

Check whether all characters in each string form integer.

If a string has zero characters, False is returned for that check.

ReturnsSeries or Index of bool

Series or Index of boolean values with the same length as the original Series/Index.

See also

isalnum

Check whether all characters are alphanumeric.

isalpha

Check whether all characters are alphabetic.

isdecimal

Check whether all characters are decimal.

isdigit

Check whether all characters are digits.

isnumeric

Check whether all characters are numeric.

isfloat

Check whether all characters are float.

islower

Check whether all characters are lowercase.

isspace

Check whether all characters are whitespace.

isupper

Check whether all characters are uppercase.

Examples

>>> import cudf
>>> s = cudf.Series(["1", "0.1", "+100", "-15", "abc"])
>>> s.str.isinteger()
0     True
1    False
2     True
3     True
4    False
dtype: bool
>>> s = cudf.Series(["this is plan text", "", "10 10"])
>>> s.str.isinteger()
0    False
1    False
2    False
dtype: bool
isipv4()Union[cudf.Series, cudf.Index]

Check whether all characters in each string form an IPv4 address.

If a string has zero characters, False is returned for that check.

ReturnsSeries or Index of bool

Series or Index of boolean values with the same length as the original Series/Index.

Examples

>>> import cudf
>>> s = cudf.Series(["", "127.0.0.1", "255.255.255.255", "123.456"])
>>> s.str.isipv4()
0    False
1     True
2     True
3    False
dtype: bool
islower()Union[cudf.Series, cudf.Index]

Check whether all characters in each string are lowercase.

This is equivalent to running the Python string method str.islower() for each element of the Series/Index. If a string has zero characters, False is returned for that check.

ReturnsSeries or Index of bool

Series or Index of boolean values with the same length as the original Series/Index.

See also

isalnum

Check whether all characters are alphanumeric.

isalpha

Check whether all characters are alphabetic.

isdecimal

Check whether all characters are decimal.

isdigit

Check whether all characters are digits.

isinteger

Check whether all characters are integer.

isnumeric

Check whether all characters are numeric.

isfloat

Check whether all characters are float.

isspace

Check whether all characters are whitespace.

isupper

Check whether all characters are uppercase.

Examples

>>> import cudf
>>> s = cudf.Series(['leopard', 'Golden Eagle', 'SNAKE', ''])
>>> s.str.islower()
0     True
1    False
2    False
3    False
dtype: bool
isnumeric()Union[cudf.Series, cudf.Index]

Check whether all characters in each string are numeric.

This is equivalent to running the Python string method str.isnumeric() for each element of the Series/Index. If a string has zero characters, False is returned for that check.

ReturnsSeries or Index of bool

Series or Index of boolean values with the same length as the original Series/Index.

See also

isalnum

Check whether all characters are alphanumeric.

isalpha

Check whether all characters are alphabetic.

isdecimal

Check whether all characters are decimal.

isdigit

Check whether all characters are digits.

isinteger

Check whether all characters are integer.

isfloat

Check whether all characters are float.

islower

Check whether all characters are lowercase.

isspace

Check whether all characters are whitespace.

isupper

Check whether all characters are uppercase.

Examples

>>> import cudf
>>> s1 = cudf.Series(['one', 'one1', '1', ''])
>>> s1.str.isnumeric()
0    False
1    False
2     True
3    False
dtype: bool

The s1.str.isnumeric method is the same as s2.str.isdigit but also includes other characters that can represent quantities such as unicode fractions.

>>> s2 = pd.Series(['23', '³', 'â…•', ''])
>>> s2.str.isnumeric()
0     True
1     True
2     True
3    False
dtype: bool
isspace()Union[cudf.Series, cudf.Index]

Check whether all characters in each string are whitespace.

This is equivalent to running the Python string method str.isspace() for each element of the Series/Index. If a string has zero characters, False is returned for that check.

ReturnsSeries or Index of bool

Series or Index of boolean values with the same length as the original Series/Index.

See also

isalnum

Check whether all characters are alphanumeric.

isalpha

Check whether all characters are alphabetic.

isdecimal

Check whether all characters are decimal.

isdigit

Check whether all characters are digits.

isinteger

Check whether all characters are integer.

isnumeric

Check whether all characters are numeric.

isfloat

Check whether all characters are float.

islower

Check whether all characters are lowercase.

isupper

Check whether all characters are uppercase.

Examples

>>> import cudf
>>> s = cudf.Series([' ', '\t\r\n ', ''])
>>> s.str.isspace()
0     True
1     True
2    False
dtype: bool
istimestamp(format: str)Union[cudf.Series, cudf.Index]

Check whether all characters in each string can be converted to a timestamp using the given format.

ReturnsSeries or Index of bool

Series or Index of boolean values with the same length as the original Series/Index.

Examples

>>> import cudf
>>> s = cudf.Series(["20201101", "192011", "18200111", "2120-11-01"])
>>> s.str.istimestamp("%Y%m%d")
0     True
1    False
2     True
3    False
dtype: bool
isupper()Union[cudf.Series, cudf.Index]

Check whether all characters in each string are uppercase.

This is equivalent to running the Python string method str.isupper() for each element of the Series/Index. If a string has zero characters, False is returned for that check.

ReturnsSeries or Index of bool

Series or Index of boolean values with the same length as the original Series/Index.

See also

isalnum

Check whether all characters are alphanumeric.

isalpha

Check whether all characters are alphabetic.

isdecimal

Check whether all characters are decimal.

isdigit

Check whether all characters are digits.

isinteger

Check whether all characters are integer.

isnumeric

Check whether all characters are numeric.

isfloat

Check whether all characters are float.

islower

Check whether all characters are lowercase.

isspace

Check whether all characters are whitespace.

Examples

>>> import cudf
>>> s = cudf.Series(['leopard', 'Golden Eagle', 'SNAKE', ''])
>>> s.str.isupper()
0    False
1    False
2     True
3    False
dtype: bool
join(sep)Union[cudf.Series, cudf.Index]

Join lists contained as elements in the Series/Index with passed delimiter.

RaisesNotImplementedError

Columns of arrays / lists are not yet supported.

len()Union[cudf.Series, cudf.Index]

Computes the length of each element in the Series/Index.

ReturnsSeries or Index of int

A Series or Index of integer values indicating the length of each element in the Series or Index.

Examples

>>> import cudf
>>> s = cudf.Series(["dog", "", "\n", None])
>>> s.str.len()
0       3
1       0
2       1
3    <NA>
dtype: int32
ljust(width: int, fillchar: str = ' ')Union[cudf.Series, cudf.Index]

Filling right side of strings in the Series/Index with an additional character. Equivalent to str.ljust().

Parameters
widthint

Minimum width of resulting string; additional characters will be filled with fillchar.

fillcharstr, default ‘ ‘ (whitespace)

Additional character for filling, default is whitespace.

Returns
Series/Index of str dtype

Returns Series or Index.

Examples

>>> import cudf
>>> s = cudf.Series(["hello world", "rapids ai"])
>>> s.str.ljust(10, fillchar="_")
0    hello world
1     rapids ai_
dtype: object
>>> s = cudf.Series(["a", "",  "ab", "__"])
>>> s.str.ljust(1, fillchar="-")
0     a
1     -
2    ab
3    __
dtype: object
lower()Union[cudf.Series, cudf.Index]

Converts all characters to lowercase.

Equivalent to str.lower().

ReturnsSeries or Index of object

A copy of the object with all strings converted to lowercase.

See also

upper

Converts all characters to uppercase.

title

Converts first character of each word to uppercase and remaining to lowercase.

capitalize

Converts first character to uppercase and remaining to lowercase.

swapcase

Converts uppercase to lowercase and lowercase to uppercase.

Examples

>>> import cudf
>>> data = ['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe']
>>> s = cudf.Series(data)
>>> s.str.lower()
0                 lower
1              capitals
2    this is a sentence
3              swapcase
dtype: object
lstrip(to_strip: Optional[str] = None)Union[cudf.Series, cudf.Index]

Remove leading and trailing characters.

Strip whitespaces (including newlines) or a set of specified characters from each string in the Series/Index from left side. Equivalent to str.lstrip().

Parameters
to_stripstr or None, default None

Specifying the set of characters to be removed. All combinations of this set of characters will be stripped. If None then whitespaces are removed.

Returns
Series or Index of object

See also

strip

Remove leading and trailing characters in Series/Index.

rstrip

Remove trailing characters in Series/Index.

Examples

>>> import cudf
>>> s = cudf.Series(['1. Ant.  ', '2. Bee!\n', '3. Cat?\t', None])
>>> s.str.lstrip('123.')
0     Ant.
1     Bee!\n
2     Cat?\t
3       <NA>
dtype: object
match(pat: str, case: bool = True, flags: int = 0)Union[cudf.Series, cudf.Index]

Determine if each string matches a regular expression.

Parameters
patstr

Character sequence or regular expression.

Returns
Series or Index of boolean values.

Notes

Parameters currently not supported are: case, flags and na.

Examples

>>> import cudf
>>> s = cudf.Series(["rapids", "ai", "cudf"])

Checking for strings starting with a.

>>> s.str.match('a')
0    False
1     True
2    False
dtype: bool

Checking for strings starting with any of a or c.

>>> s.str.match('[ac]')
0    False
1     True
2     True
dtype: bool
ngrams(n: int = 2, separator: str = '_')Union[cudf.Series, cudf.Index]

Generate the n-grams from a set of tokens, each record in series is treated a token.

You can generate tokens from a Series instance using the Series.str.tokenize() function.

Parameters
nint

The degree of the n-gram (number of consecutive tokens). Default of 2 for bigrams.

separatorstr

The separator to use between within an n-gram. Default is ‘_’.

Examples

>>> import cudf
>>> str_series = cudf.Series(['this is my', 'favorite book'])
>>> str_series = cudf.Series(['this is my', 'favorite book'])
>>> str_series.str.ngrams(2, "_")
0    this is my_favorite book
dtype: object
>>> str_series = cudf.Series(['abc','def','xyz','hhh'])
>>> str_series.str.ngrams(2, "_")
0    abc_def
1    def_xyz
2    xyz_hhh
dtype: object
ngrams_tokenize(n: int = 2, delimiter: str = ' ', separator: str = '_')Union[cudf.Series, cudf.Index]

Generate the n-grams using tokens from each string. This will tokenize each string and then generate ngrams for each string.

Parameters
nint, Default 2.

The degree of the n-gram (number of consecutive tokens).

delimiterstr, Default is white-space.

The character used to locate the split points of each string.

sepstr, Default is ‘_’.

The separator to use between tokens within an n-gram.

Returns
Series or Index of object.

Examples

>>> import cudf
>>> ser = cudf.Series(['this is the', 'best book'])
>>> ser.str.ngrams_tokenize(n=2, sep='_')
0      this_is
1       is_the
2    best_book
dtype: object
normalize_characters(do_lower: bool = True)Union[cudf.Series, cudf.Index]

Normalizes strings characters for tokenizing.

This uses the normalizer that is built into the subword_tokenize function which includes:

  • adding padding around punctuation (unicode category starts with “P”) as well as certain ASCII symbols like “^” and “$”

  • adding padding around the CJK Unicode block characters

  • changing whitespace (e.g. \t, \n, \r) to space

  • removing control characters (unicode categories “Cc” and “Cf”)

If do_lower_case = true, lower-casing also removes the accents. The accents cannot be removed from upper-case characters without lower-casing and lower-casing cannot be performed without also removing accents. However, if the accented character is already lower-case, then only the accent is removed.

Parameters
do_lowerbool, Default is True

If set to True, characters will be lower-cased and accents will be removed. If False, accented and upper-case characters are not transformed.

Returns
Series or Index of object.

Examples

>>> import cudf
>>> ser = cudf.Series(["héllo, \tworld","ĂĆCÄ–ÑTED","$99"])
>>> ser.str.normalize_characters()
0    hello ,  world
1          accented
2              $ 99
dtype: object
>>> ser.str.normalize_characters(do_lower=False)
0    héllo ,  world
1          ĂĆCÄ–ÑTED
2              $ 99
dtype: object
normalize_spaces()Union[cudf.Series, cudf.Index]

Remove extra whitespace between tokens and trim whitespace from the beginning and the end of each string.

Returns
Series or Index of object.

Examples

>>> import cudf
>>> ser = cudf.Series(["hello \t world"," test string  "])
>>> ser.str.normalize_spaces()
0    hello world
1    test string
dtype: object
pad(width: int, side: str = 'left', fillchar: str = ' ')Union[cudf.Series, cudf.Index]

Pad strings in the Series/Index up to width.

Parameters
widthint

Minimum width of resulting string; additional characters will be filled with character defined in fillchar.

side{‘left’, ‘right’, ‘both’}, default ‘left’

Side from which to fill resulting string.

fillcharstr, default ‘ ‘ (whitespace)

Additional character for filling, default is whitespace.

Returns
Series/Index of object

Returns Series or Index with minimum number of char in object.

See also

rjust

Fills the left side of strings with an arbitrary character. Equivalent to Series.str.pad(side='left').

ljust

Fills the right side of strings with an arbitrary character. Equivalent to Series.str.pad(side='right').

center

Fills boths sides of strings with an arbitrary character. Equivalent to Series.str.pad(side='both').

zfill

Pad strings in the Series/Index by prepending ‘0’ character. Equivalent to Series.str.pad(side='left', fillchar='0').

Examples

>>> import cudf
>>> s = cudf.Series(["caribou", "tiger"])
>>> s.str.pad(width=10)
0       caribou
1         tiger
dtype: object
>>> s.str.pad(width=10, side='right', fillchar='-')
0    caribou---
1    tiger-----
dtype: object
>>> s.str.pad(width=10, side='both', fillchar='-')
0    -caribou--
1    --tiger---
dtype: object
partition(sep: str = ' ', expand: bool = True)Union[cudf.Series, cudf.Index]

Split the string at the first occurrence of sep.

This method splits the string at the first occurrence of sep, and returns 3 elements containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return 3 elements containing the string itself, followed by two empty strings.

Parameters
sepstr, default ‘ ‘ (whitespace)

String to split on.

Returns
DataFrame or MultiIndex

Returns a DataFrame / MultiIndex

See also

rpartition

Split the string at the last occurrence of sep.

split

Split strings around given separators.

Notes

The parameter expand is not yet supported and will raise a NotImplementedError if anything other than the default value is set.

Examples

>>> import cudf
>>> s = cudf.Series(['Linda van der Berg', 'George Pitt-Rivers'])
>>> s
0    Linda van der Berg
1    George Pitt-Rivers
dtype: object
>>> s.str.partition()
        0  1             2
0   Linda     van der Berg
1  George      Pitt-Rivers

To partition by something different than a space:

>>> s.str.partition('-')
                    0  1       2
0  Linda van der Berg
1         George Pitt  -  Rivers

Also available on indices:

>>> idx = cudf.core.index.StringIndex(['X 123', 'Y 999'])
>>> idx
StringIndex(['X 123' 'Y 999'], dtype='object')

Which will create a MultiIndex:

>>> idx.str.partition()
MultiIndex([('X', ' ', '123'),
            ('Y', ' ', '999')],
           )
porter_stemmer_measure()Union[cudf.Series, cudf.Index]

Compute the Porter Stemmer measure for each string. The Porter Stemmer algorithm is described here.

Returns
Series or Index of object.

Examples

>>> import cudf
>>> ser = cudf.Series(["hello", "super"])
>>> ser.str.porter_stemmer_measure()
0    1
1    2
dtype: int32
replace(pat: Union[str, Sequence], repl: Union[str, Sequence], n: int = - 1, case=None, flags: int = 0, regex: bool = True)Union[cudf.Series, cudf.Index]

Replace occurrences of pattern/regex in the Series/Index with some other string. Equivalent to str.replace() or re.sub().

Parameters
patstr or list-like

String(s) to be replaced as a character sequence or regular expression.

replstr or list-like

String(s) to be used as replacement.

nint, default -1 (all)

Number of replacements to make from the start.

regexbool, default True

If True, assumes the pattern is a regular expression. If False, treats the pattern as a literal string.

Returns
Series/Index of str dtype

A copy of the object with all matching occurrences of pat replaced by repl.

Notes

The parameters case and flags are not yet supported and will raise a NotImplementedError if anything other than the default value is set.

Examples

>>> import cudf
>>> s = cudf.Series(['foo', 'fuz', None])
>>> s
0     foo
1     fuz
2    <NA>
dtype: object

When pat is a string and regex is True (the default), the given pat is compiled as a regex. When repl is a string, it replaces matching regex patterns as with re.sub(). NaN value(s) in the Series are left as is:

>>> s.str.replace('f.', 'ba', regex=True)
0     bao
1     baz
2    <NA>
dtype: object

When pat is a string and regex is False, every pat is replaced with repl as with str.replace():

>>> s.str.replace('f.', 'ba', regex=False)
0     foo
1     fuz
2    <NA>
dtype: object
replace_tokens(targets, replacements, delimiter: Optional[str] = None)Union[cudf.Series, cudf.Index]

The targets tokens are searched for within each string in the series and replaced with the corresponding replacements if found. Tokens are identified by the delimiter character provided.

Parameters
targetsarray-like, Sequence or Series

The tokens to search for inside each string.

replacementsarray-like, Sequence, Series or str

The strings to replace for each found target token found. Alternately, this can be a single str instance and would be used as replacement for each string found.

delimiterstr

The character used to locate the tokens of each string. Default is whitespace.

Returns
Series or Index of object.

Examples

>>> import cudf
>>> sr = cudf.Series(["this is me", "theme music", ""])
>>> targets = cudf.Series(["is", "me"])
>>> sr.str.replace_tokens(targets=targets, replacements="_")
0       this _ _
1    theme music
2
dtype: object
>>> sr = cudf.Series(["this;is;me", "theme;music", ""])
>>> sr.str.replace_tokens(targets=targets, replacements=":")
0     this;is;me
1    theme;music
2
dtype: object
replace_with_backrefs(pat: str, repl: str)Union[cudf.Series, cudf.Index]

Use the repl back-ref template to create a new string with the extracted elements found using the pat expression.

Parameters
patstr

Regex with groupings to identify extract sections. This should not be a compiled regex.

replstr

String template containing back-reference indicators.

Returns
Series/Index of str dtype

Examples

>>> import cudf
>>> s = cudf.Series(["A543","Z756"])
>>> s.str.replace_with_backrefs('(\\d)(\\d)', 'V\\2\\1')
0    AV453
1    ZV576
dtype: object
rfind(sub: str, start: int = 0, end: Optional[int] = None)Union[cudf.Series, cudf.Index]

Return highest indexes in each strings in the Series/Index where the substring is fully contained between [start:end]. Return -1 on failure. Equivalent to standard str.rfind().

Parameters
substr

Substring being searched.

startint

Left edge index.

endint

Right edge index.

Returns
Series or Index of int

See also

find

Return lowest indexes in each strings.

Examples

>>> import cudf
>>> s = cudf.Series(["abc", "hello world", "rapids ai"])
>>> s.str.rfind('a')
0    0
1   -1
2    7
dtype: int32

Using start and end parameters.

>>> s.str.rfind('a', start=2, end=5)
0   -1
1   -1
2   -1
dtype: int32
rindex(sub: str, start: int = 0, end: Optional[int] = None)Union[cudf.Series, cudf.Index]

Return highest indexes in each strings where the substring is fully contained between [start:end]. This is the same as str.rfind except instead of returning -1, it raises a ValueError when the substring is not found.

Parameters
substr

Substring being searched.

startint

Left edge index.

endint

Right edge index.

Returns
Series or Index of object

Examples

>>> import cudf
>>> s = cudf.Series(['abc', 'a','b' ,'ddb'])
>>> s.str.rindex('b')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: substring not found

Parameters such as start and end can also be used.

>>> s = cudf.Series(['abc', 'abb','ab' ,'ddb'])
>>> s.str.rindex('b', start=1, end=5)
0    1
1    2
2    1
3    2
dtype: int32
rjust(width: int, fillchar: str = ' ')Union[cudf.Series, cudf.Index]

Filling left side of strings in the Series/Index with an additional character. Equivalent to str.rjust().

Parameters
widthint

Minimum width of resulting string; additional characters will be filled with fillchar.

fillcharstr, default ‘ ‘ (whitespace)

Additional character for filling, default is whitespace.

Returns
Series/Index of str dtype

Returns Series or Index.

Examples

>>> import cudf
>>> s = cudf.Series(["hello world", "rapids ai"])
>>> s.str.rjust(20, fillchar="_")
0    _________hello world
1    ___________rapids ai
dtype: object
>>> s = cudf.Series(["a", "",  "ab", "__"])
>>> s.str.rjust(1, fillchar="-")
0     a
1     -
2    ab
3    __
dtype: object
rpartition(sep: str = ' ', expand: bool = True)Union[cudf.Series, cudf.Index]

Split the string at the last occurrence of sep.

This method splits the string at the last occurrence of sep, and returns 3 elements containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return 3 elements containing two empty strings, followed by the string itself.

Parameters
sepstr, default ‘ ‘ (whitespace)

String to split on.

Returns
DataFrame or MultiIndex

Returns a DataFrame / MultiIndex

Notes

The parameter expand is not yet supported and will raise a NotImplementedError if anything other than the default value is set.

Examples

>>> import cudf
>>> s = cudf.Series(['Linda van der Berg', 'George Pitt-Rivers'])
>>> s
0    Linda van der Berg
1    George Pitt-Rivers
dtype: object
>>> s.str.rpartition()
            0  1            2
0  Linda van der            Berg
1         George     Pitt-Rivers

Also available on indices:

>>> idx = cudf.core.index.StringIndex(['X 123', 'Y 999'])
>>> idx
StringIndex(['X 123' 'Y 999'], dtype='object')

Which will create a MultiIndex:

>>> idx.str.rpartition()
MultiIndex([('X', ' ', '123'),
            ('Y', ' ', '999')],
           )
rsplit(pat: Optional[str] = None, n: int = - 1, expand: Optional[bool] = None)Union[cudf.Series, cudf.Index]

Split strings around given separator/delimiter.

Splits the string in the Series/Index from the end, at the specified delimiter string. Equivalent to str.rsplit().

Parameters
patstr, default ‘ ‘ (space)

String to split on, does not yet support regular expressions.

nint, default -1 (all)

Limit number of splits in output. None, 0, and -1 will all be interpreted as “all splits”.

expandbool, default False

Expand the split strings into separate columns.

  • If True, return DataFrame/MultiIndex expanding dimensionality.

  • If False, return Series/Index, containing lists of strings.

Returns
Series, Index, DataFrame or MultiIndex

Type matches caller unless expand=True (see Notes).

See also

split

Split strings around given separator/delimiter.

str.split

Standard library version for split.

str.rsplit

Standard library version for rsplit.

Notes

The handling of the n keyword depends on the number of found splits:

  • If found splits > n, make first n splits only

  • If found splits <= n, make all splits

  • If for a certain row the number of found splits < n, append None for padding up to n if expand=True.

If using expand=True, Series and Index callers return DataFrame and MultiIndex objects, respectively.

Examples

>>> import cudf
>>> s = cudf.Series(
...     [
...         "this is a regular sentence",
...         "https://docs.python.org/3/tutorial/index.html",
...         None
...     ]
... )
>>> s
0                       this is a regular sentence
1    https://docs.python.org/3/tutorial/index.html
2                                             <NA>
dtype: object

In the default setting, the string is split by whitespace.

>>> s.str.rsplit()
0                   [this, is, a, regular, sentence]
1    [https://docs.python.org/3/tutorial/index.html]
2                                               None
dtype: list

Without the n parameter, the outputs of rsplit and split are identical.

>>> s.str.split()
0                   [this, is, a, regular, sentence]
1    [https://docs.python.org/3/tutorial/index.html]
2                                               None
dtype: list

The n parameter can be used to limit the number of splits on the delimiter. The outputs of split and rsplit are different.

>>> s.str.rsplit(n=2)
0                     [this is a, regular, sentence]
1    [https://docs.python.org/3/tutorial/index.html]
2                                               None
dtype: list
>>> s.str.split(n=2)
0                     [this, is, a regular sentence]
1    [https://docs.python.org/3/tutorial/index.html]
2                                               None
dtype: list

When using expand=True, the split elements will expand out into separate columns. If <NA> value is present, it is propagated throughout the columns during the split.

>>> s.str.rsplit(n=2, expand=True)
                                               0        1         2
0                                      this is a  regular  sentence
1  https://docs.python.org/3/tutorial/index.html     <NA>      <NA>
2                                           <NA>     <NA>      <NA>

For slightly more complex use cases like splitting the html document name from a url, a combination of parameter settings can be used.

>>> s.str.rsplit("/", n=1, expand=True)
                                    0           1
0          this is a regular sentence        <NA>
1  https://docs.python.org/3/tutorial  index.html
2                                <NA>        <NA>
rstrip(to_strip: Optional[str] = None)Union[cudf.Series, cudf.Index]

Remove leading and trailing characters.

Strip whitespaces (including newlines) or a set of specified characters from each string in the Series/Index from right side. Equivalent to str.rstrip().

Parameters
to_stripstr or None, default None

Specifying the set of characters to be removed. All combinations of this set of characters will be stripped. If None then whitespaces are removed.

Returns
Series/Index of str dtype

Returns Series or Index.

See also

strip

Remove leading and trailing characters in Series/Index.

lstrip

Remove leading characters in Series/Index.

Examples

>>> import cudf
>>> s = cudf.Series(['1. Ant.  ', '2. Bee!\n', '3. Cat?\t', None])
>>> s
0    1. Ant.
1    2. Bee!\n
2    3. Cat?\t
3         <NA>
dtype: object
>>> s.str.rstrip('.!? \n\t')
0    1. Ant
1    2. Bee
2    3. Cat
3      <NA>
dtype: object
slice(start: Optional[int] = None, stop: Optional[int] = None, step: Optional[int] = None)Union[cudf.Series, cudf.Index]

Slice substrings from each element in the Series or Index.

Parameters
startint, optional

Start position for slice operation.

stopint, optional

Stop position for slice operation.

stepint, optional

Step size for slice operation.

Returns
Series/Index of str dtype

Series or Index from sliced substring from original string object.

See also

slice_replace

Replace a slice with a string.

get

Return element at position. Equivalent to Series.str.slice(start=i, stop=i+1) with i being the position.

Examples

>>> import cudf
>>> s = cudf.Series(["koala", "fox", "chameleon"])
>>> s
0        koala
1          fox
2    chameleon
dtype: object
>>> s.str.slice(start=1)
0        oala
1          ox
2    hameleon
dtype: object
>>> s.str.slice(start=-1)
0    a
1    x
2    n
dtype: object
>>> s.str.slice(stop=2)
0    ko
1    fo
2    ch
dtype: object
>>> s.str.slice(step=2)
0      kaa
1       fx
2    caeen
dtype: object
>>> s.str.slice(start=0, stop=5, step=3)
0    kl
1     f
2    cm
dtype: object
slice_from(starts: cudf.Series, stops: cudf.Series)Union[cudf.Series, cudf.Index]

Return substring of each string using positions for each string.

The starts and stops parameters are of Column type.

Parameters
startsSeries

Beginning position of each the string to extract. Default is beginning of the each string.

stopsSeries

Ending position of the each string to extract. Default is end of each string. Use -1 to specify to the end of that string.

Returns
Series/Index of str dtype

A substring of each string using positions for each string.

Examples

>>> import cudf
>>> s = cudf.Series(["hello","there"])
>>> s
0    hello
1    there
dtype: object
>>> starts = cudf.Series([1, 3])
>>> stops = cudf.Series([5, 5])
>>> s.str.slice_from(starts, stops)
0    ello
1      re
dtype: object
slice_replace(start: Optional[int] = None, stop: Optional[int] = None, repl: Optional[str] = None)Union[cudf.Series, cudf.Index]

Replace the specified section of each string with a new string.

Parameters
startint, optional

Beginning position of the string to replace. Default is beginning of the each string.

stopint, optional

Ending position of the string to replace. Default is end of each string.

replstr, optional

String to insert into the specified position values.

Returns
Series/Index of str dtype

A new string with the specified section of the string replaced with repl string.

See also

slice

Just slicing without replacement.

Examples

>>> import cudf
>>> s = cudf.Series(['a', 'ab', 'abc', 'abdc', 'abcde'])
>>> s
0        a
1       ab
2      abc
3     abdc
4    abcde
dtype: object

Specify just start, meaning replace start until the end of the string with repl.

>>> s.str.slice_replace(1, repl='X')
0    aX
1    aX
2    aX
3    aX
4    aX
dtype: object

Specify just stop, meaning the start of the string to stop is replaced with repl, and the rest of the string is included.

>>> s.str.slice_replace(stop=2, repl='X')
0       X
1       X
2      Xc
3     Xdc
4    Xcde
dtype: object

Specify start and stop, meaning the slice from start to stop is replaced with repl. Everything before or after start and stop is included as is.

>>> s.str.slice_replace(start=1, stop=3, repl='X')
0      aX
1      aX
2      aX
3     aXc
4    aXde
dtype: object
split(pat: Optional[str] = None, n: int = - 1, expand: Optional[bool] = None)Union[cudf.Series, cudf.Index]

Split strings around given separator/delimiter.

Splits the string in the Series/Index from the beginning, at the specified delimiter string. Equivalent to str.split().

Parameters
patstr, default ‘ ‘ (space)

String to split on, does not yet support regular expressions.

nint, default -1 (all)

Limit number of splits in output. None, 0, and -1 will all be interpreted as “all splits”.

expandbool, default False

Expand the split strings into separate columns.

  • If True, return DataFrame/MultiIndex expanding dimensionality.

  • If False, return Series/Index, containing lists of strings.

Returns
Series, Index, DataFrame or MultiIndex

Type matches caller unless expand=True (see Notes).

See also

rsplit

Splits string around given separator/delimiter, starting from the right.

str.split

Standard library version for split.

str.rsplit

Standard library version for rsplit.

Notes

The handling of the n keyword depends on the number of found splits:

  • If found splits > n, make first n splits only

  • If found splits <= n, make all splits

  • If for a certain row the number of found splits < n, append None for padding up to n if expand=True.

If using expand=True, Series and Index callers return DataFrame and MultiIndex objects, respectively.

Examples

>>> import cudf
>>> data = ["this is a regular sentence",
...     "https://docs.python.org/index.html", None]
>>> s = cudf.Series(data)
>>> s
0            this is a regular sentence
1    https://docs.python.org/index.html
2                                  <NA>
dtype: object

In the default setting, the string is split by whitespace.

>>> s.str.split()
0        [this, is, a, regular, sentence]
1    [https://docs.python.org/index.html]
2                                    None
dtype: list

Without the n parameter, the outputs of rsplit and split are identical.

>>> s.str.rsplit()
0        [this, is, a, regular, sentence]
1    [https://docs.python.org/index.html]
2                                    None
dtype: list

The n parameter can be used to limit the number of splits on the delimiter.

>>> s.str.split(n=2)
0          [this, is, a regular sentence]
1    [https://docs.python.org/index.html]
2                                    None
dtype: list

The pat parameter can be used to split by other characters.

>>> s.str.split(pat="/")
0               [this is a regular sentence]
1    [https:, , docs.python.org, index.html]
2                                       None
dtype: list

When using expand=True, the split elements will expand out into separate columns. If <NA> value is present, it is propagated throughout the columns during the split.

>>> s.str.split(expand=True)
                                    0     1     2        3         4
0                                this    is     a  regular  sentence
1  https://docs.python.org/index.html  <NA>  <NA>     <NA>      <NA>
2                                <NA>  <NA>  <NA>     <NA>      <NA>
startswith(pat: Union[str, Sequence])Union[cudf.Series, cudf.Index]

Test if the start of each string element matches a pattern.

Equivalent to str.startswith().

Parameters
patstr or list-like

If str is an str, evaluates whether each string of series starts with pat. If pat is a list-like, evaluates whether self[i] starts with pat[i]. Regular expressions are not accepted.

Returns
Series or Index of bool

A Series of booleans indicating whether the given pattern matches the start of each string element.

See also

endswith

Same as startswith, but tests the end of string.

contains

Tests if string element contains a pattern.

Examples

>>> import cudf
>>> s = cudf.Series(['bat', 'Bear', 'cat', None])
>>> s
0     bat
1    Bear
2     cat
3    <NA>
dtype: object
>>> s.str.startswith('b')
0     True
1    False
2    False
3     <NA>
dtype: bool
strip(to_strip: Optional[str] = None)Union[cudf.Series, cudf.Index]

Remove leading and trailing characters.

Strip whitespaces (including newlines) or a set of specified characters from each string in the Series/Index from left and right sides. Equivalent to str.strip().

Parameters
to_stripstr or None, default None

Specifying the set of characters to be removed. All combinations of this set of characters will be stripped. If None then whitespaces are removed.

Returns
Series/Index of str dtype

Returns Series or Index.

See also

lstrip

Remove leading characters in Series/Index.

rstrip

Remove trailing characters in Series/Index.

Examples

>>> import cudf
>>> s = cudf.Series(['1. Ant.  ', '2. Bee!\n', '3. Cat?\t', None])
>>> s
0    1. Ant.
1    2. Bee!\n
2    3. Cat?\t
3         <NA>
dtype: object
>>> s.str.strip()
0    1. Ant.
1    2. Bee!
2    3. Cat?
3       <NA>
dtype: object
>>> s.str.strip('123.!? \n\t')
0     Ant
1     Bee
2     Cat
3    <NA>
dtype: object
subword_tokenize(hash_file: str, max_length: int = 64, stride: int = 48, do_lower: bool = True, do_truncate: bool = False, max_rows_tensor: int = 500)Tuple[cupy.ndarray, cupy.ndarray, cupy.ndarray]

Run CUDA BERT subword tokenizer on cuDF strings column. Encodes words to token ids using vocabulary from a pretrained tokenizer.

This function requires about 21x the number of character bytes in the input strings column as working memory.

Parameters
hash_filestr

Path to hash file containing vocabulary of words with token-ids. This can be created from the raw vocabulary using the cudf.utils.hash_vocab_utils.hash_vocab function

max_lengthint, Default is 64

Limits the length of the sequence returned. If tokenized string is shorter than max_length, output will be padded with 0s. If the tokenized string is longer than max_length and do_truncate == False, there will be multiple returned sequences containing the overflowing token-ids.

strideint, Default is 48

If do_truncate == False and the tokenized string is larger than max_length, the sequences containing the overflowing token-ids can contain duplicated token-ids from the main sequence. If max_length is equal to stride there are no duplicated-id tokens. If stride is 80% of max_length, 20% of the first sequence will be repeated on the second sequence and so on until the entire sentence is encoded.

do_lowerbool, Default is True

If set to true, original text will be lowercased before encoding.

do_truncatebool, Default is False

If set to true, strings will be truncated and padded to max_length. Each input string will result in exactly one output sequence. If set to false, there may be multiple output sequences when the max_length is smaller than generated tokens.

max_rows_tensorint, Default is 500

Maximum number of rows for the output token-ids expected to be generated by the tokenizer. Used for allocating temporary working memory on the GPU device. If the output generates a larger number of rows, behavior is undefined. This will vary based on stride, truncation, and max_length. For example, for non-overlapping sequences output rows will be the same as input rows.

Returns
token-idscupy.ndarray

The token-ids for each string padded with 0s to max_length.

attention-maskcupy.ndarray

The mask for token-ids result where corresponding positions identify valid token-id values.

metadatacupy.ndarray

Each row contains the index id of the original string and the first and last index of the token-ids that are non-padded and non-overlapping.

Examples

>>> import cudf
>>> from cudf.utils.hash_vocab_utils  import hash_vocab
>>> hash_vocab('bert-base-uncased-vocab.txt', 'voc_hash.txt')
>>> ser = cudf.Series(['this is the', 'best book'])
>>> stride, max_length = 8, 8
>>> max_rows_tensor = len(ser)
>>> tokens, masks, metadata = ser.str.subword_tokenize('voc_hash.txt',
... max_length=max_length, stride=stride,
... max_rows_tensor=max_rows_tensor)
>>> tokens.reshape(-1, max_length)
array([[2023, 2003, 1996,    0,    0,    0,    0,    0],
       [2190, 2338,    0,    0,    0,    0,    0,    0]], dtype=uint32)
>>> masks.reshape(-1, max_length)
array([[1, 1, 1, 0, 0, 0, 0, 0],
       [1, 1, 0, 0, 0, 0, 0, 0]], dtype=uint32)
>>> metadata.reshape(-1, 3)
array([[0, 0, 2],
       [1, 0, 1]], dtype=uint32)
swapcase()Union[cudf.Series, cudf.Index]

Change each lowercase character to uppercase and vice versa. This only applies to ASCII characters at this time.

Equivalent to str.swapcase().

Returns : Series or Index of object

See also

lower

Converts all characters to lowercase.

upper

Converts all characters to uppercase.

title

Converts first character of each word to uppercase and remaining to lowercase.

capitalize

Converts first character to uppercase and remaining to lowercase.

Examples

>>> import cudf
>>> data = ['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe']
>>> s = cudf.Series(data)
>>> s
0                 lower
1              CAPITALS
2    this is a sentence
3              SwApCaSe
dtype: object
>>> s.str.swapcase()
0                 LOWER
1              capitals
2    THIS IS A SENTENCE
3              sWaPcAsE
dtype: object
title()Union[cudf.Series, cudf.Index]

Uppercase the first letter of each letter after a space and lowercase the rest. This only applies to ASCII characters at this time.

Equivalent to str.title().

Returns : Series or Index of object

See also

lower

Converts all characters to lowercase.

upper

Converts all characters to uppercase.

capitalize

Converts first character to uppercase and remaining to lowercase.

swapcase

Converts uppercase to lowercase and lowercase to uppercase.

Examples

>>> import cudf
>>> data = ['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe'])
>>> s = cudf.Series(data)
>>> s
0                 lower
1              CAPITALS
2    this is a sentence
3              SwApCaSe
dtype: object
>>> s.str.title()
0                 Lower
1              Capitals
2    This Is A Sentence
3              Swapcase
dtype: object
token_count(delimiter: str = ' ')Union[cudf.Series, cudf.Index]

Each string is split into tokens using the provided delimiter. The returned integer sequence is the number of tokens in each string.

Parameters
delimiterstr or list of strs, Default is whitespace.

The characters or strings used to locate the split points of each string.

Returns
Series or Index.

Examples

>>> import cudf
>>> ser = cudf.Series(["hello world","goodbye",""])
>>> ser.str.token_count()
0    2
1    1
2    0
dtype: int32
tokenize(delimiter: str = ' ')Union[cudf.Series, cudf.Index]

Each string is split into tokens using the provided delimiter(s). The sequence returned contains the tokens in the order they were found.

Parameters
delimiterstr or list of strs, Default is whitespace.

The string used to locate the split points of each string.

Returns
Series or Index of object.

Examples

>>> import cudf
>>> data = ["hello world", "goodbye world", "hello goodbye"]
>>> ser = cudf.Series(data)
>>> ser.str.tokenize()
0      hello
1      world
2    goodbye
3      world
4      hello
5    goodbye
dtype: object
translate(table: dict)Union[cudf.Series, cudf.Index]

Map all characters in the string through the given mapping table.

Equivalent to standard str.translate().

Parameters
tabledict

Table is a mapping of Unicode ordinals to Unicode ordinals, strings, or None. Unmapped characters are left untouched. str.maketrans() is a helper function for making translation tables.

Returns
Series or Index.

Examples

>>> import cudf
>>> data = ['lower', 'CAPITALS', 'this is a sentence','SwApCaSe']
>>> s = cudf.Series(data)
>>> s.str.translate({'a': "1"})
0                 lower
1              CAPITALS
2    this is 1 sentence
3              SwApC1Se
dtype: object
>>> s.str.translate({'a': "1", "e":"#"})
0                 low#r
1              CAPITALS
2    this is 1 s#nt#nc#
3              SwApC1S#
dtype: object
upper()Union[cudf.Series, cudf.Index]

Convert each string to uppercase. This only applies to ASCII characters at this time.

Equivalent to str.upper().

Returns : Series or Index of object

See also

lower

Converts all characters to lowercase.

upper

Converts all characters to uppercase.

title

Converts first character of each word to uppercase and remaining to lowercase.

capitalize

Converts first character to uppercase and remaining to lowercase.

swapcase

Converts uppercase to lowercase and lowercase to uppercase.

Examples

>>> import cudf
>>> data = ['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe']
>>> s = cudf.Series(data)
>>> s
0                 lower
1              CAPITALS
2    this is a sentence
3              SwApCaSe
dtype: object
>>> s.str.upper()
0                 LOWER
1              CAPITALS
2    THIS IS A SENTENCE
3              SWAPCASE
dtype: object
url_decode()Union[cudf.Series, cudf.Index]

Returns a URL-decoded format of each string. No format checking is performed. All characters are expected to be encoded as UTF-8 hex values.

Returns
Series or Index.

Examples

>>> import cudf
>>> s = cudf.Series(['A%2FB-C%2FD', 'e%20f.g', '4-5%2C6'])
>>> s.str.url_decode()
0    A/B-C/D
1      e f.g
2      4-5,6
dtype: object
>>> data = ["https%3A%2F%2Frapids.ai%2Fstart.html",
...     "https%3A%2F%2Fmedium.com%2Frapids-ai"]
>>> s = cudf.Series(data)
>>> s.str.url_decode()
0    https://rapids.ai/start.html
1    https://medium.com/rapids-ai
dtype: object
url_encode()Union[cudf.Series, cudf.Index]

Returns a URL-encoded format of each string. No format checking is performed. All characters are encoded except for ASCII letters, digits, and these characters: ‘.’,’_’,’-‘,’~’. Encoding converts to hex using UTF-8 encoded bytes.

Returns
Series or Index.

Examples

>>> import cudf
>>> s = cudf.Series(['A/B-C/D', 'e f.g', '4-5,6'])
>>> s.str.url_encode()
0    A%2FB-C%2FD
1        e%20f.g
2        4-5%2C6
dtype: object
>>> data = ["https://rapids.ai/start.html",
...     "https://medium.com/rapids-ai"]
>>> s = cudf.Series(data)
>>> s.str.url_encode()
0    https%3A%2F%2Frapids.ai%2Fstart.html
1    https%3A%2F%2Fmedium.com%2Frapids-ai
dtype: object
wrap(width: int, **kwargs)Union[cudf.Series, cudf.Index]

Wrap long strings in the Series/Index to be formatted in paragraphs with length less than a given width.

Parameters
widthint

Maximum line width.

Returns
Series or Index

Notes

The parameters expand_tabsbool, replace_whitespace, drop_whitespace, break_long_words, break_on_hyphens, expand_tabsbool are not yet supported and will raise a NotImplementedError if they are set to any value.

This method currently achieves behavior matching R’s stringr library str_wrap function, the equivalent pandas implementation can be obtained using the following parameter setting:

expand_tabs = False

replace_whitespace = True

drop_whitespace = True

break_long_words = False

break_on_hyphens = False

Examples

>>> import cudf
>>> data = ['line to be wrapped', 'another line to be wrapped']
>>> s = cudf.Series(data)
>>> s.str.wrap(12)
0             line to be\nwrapped
1    another line\nto be\nwrapped
dtype: object
zfill(width: int)Union[cudf.Series, cudf.Index]

Pad strings in the Series/Index by prepending ‘0’ characters.

Strings in the Series/Index are padded with ‘0’ characters on the left of the string to reach a total string length width. Strings in the Series/Index with length greater or equal to width are unchanged.

Parameters
widthint

Minimum length of resulting string; strings with length less than width be prepended with ‘0’ characters.

Returns
Series/Index of str dtype

Returns Series or Index with prepended ‘0’ characters.

See also

rjust

Fills the left side of strings with an arbitrary character.

ljust

Fills the right side of strings with an arbitrary character.

pad

Fills the specified sides of strings with an arbitrary character.

center

Fills boths sides of strings with an arbitrary character.

Notes

Differs from str.zfill() which has special handling for ‘+’/’-‘ in the string.

Examples

>>> import cudf
>>> s = cudf.Series(['-1', '1', '1000',  None])
>>> s
0      -1
1       1
2    1000
3    <NA>
dtype: object

Note that None is not string, therefore it is converted to None. The minus sign in '-1' is treated as a regular character and the zero is added to the left of it (str.zfill() would have moved it to the left). 1000 remains unchanged as it is longer than width.

>>> s.str.zfill(3)
0     0-1
1     001
2    1000
3    <NA>
dtype: object

General Functions

cudf.core.reshape.concat(objs, axis=0, join='outer', ignore_index=False, sort=None)

Concatenate DataFrames, Series, or Indices row-wise.

Parameters
objslist of DataFrame, Series, or Index
axis{0/’index’, 1/’columns’}, default 0

The axis to concatenate along.

join{‘inner’, ‘outer’}, default ‘outer’

How to handle indexes on other axis (or axes).

ignore_indexbool, default False

Set True to ignore the index of the objs and provide a default range index instead.

sortbool, default False

Sort non-concatenation axis if it is not already aligned.

Returns
A new object of like type with rows from each object in objs.

Examples

Combine two Series.

>>> import cudf
>>> s1 = cudf.Series(['a', 'b'])
>>> s2 = cudf.Series(['c', 'd'])
>>> s1
0    a
1    b
dtype: object
>>> s2
0    c
1    d
dtype: object
>>> cudf.concat([s1, s2])
0    a
1    b
0    c
1    d
dtype: object

Clear the existing index and reset it in the result by setting the ignore_index option to True.

>>> cudf.concat([s1, s2], ignore_index=True)
0    a
1    b
2    c
3    d
dtype: object

Combine two DataFrame objects with identical columns.

>>> df1 = cudf.DataFrame([['a', 1], ['b', 2]],
...                    columns=['letter', 'number'])
>>> df1
  letter  number
0      a       1
1      b       2
>>> df2 = cudf.DataFrame([['c', 3], ['d', 4]],
...                    columns=['letter', 'number'])
>>> df2
  letter  number
0      c       3
1      d       4
>>> cudf.concat([df1, df2])
  letter  number
0      a       1
1      b       2
0      c       3
1      d       4

Combine DataFrame objects with overlapping columns and return everything. Columns outside the intersection will be filled with null values.

>>> df3 = cudf.DataFrame([['c', 3, 'cat'], ['d', 4, 'dog']],
...                    columns=['letter', 'number', 'animal'])
>>> df3
  letter  number animal
0      c       3    cat
1      d       4    dog
>>> cudf.concat([df1, df3], sort=False)
  letter  number animal
0      a       1   <NA>
1      b       2   <NA>
0      c       3    cat
1      d       4    dog

Combine DataFrame objects with overlapping columns and return only those that are shared by passing inner to the join keyword argument.

>>> cudf.concat([df1, df3], join="inner")
  letter  number
0      a       1
1      b       2
0      c       3
1      d       4

Combine DataFrame objects horizontally along the x axis by passing in axis=1.

>>> df4 = cudf.DataFrame([['bird', 'polly'], ['monkey', 'george']],
...                    columns=['animal', 'name'])
>>> df4
   animal    name
0    bird   polly
1  monkey  george
>>> cudf.concat([df1, df4], axis=1)
  letter  number  animal    name
0      a       1    bird   polly
1      b       2  monkey  george
cudf.core.reshape.get_dummies(df, prefix=None, prefix_sep='_', dummy_na=False, columns=None, cats=None, sparse=False, drop_first=False, dtype='uint8')

Returns a dataframe whose columns are the one hot encodings of all columns in df

Parameters
dfarray-like, Series, or DataFrame

Data of which to get dummy indicators.

prefixstr, dict, or sequence, optional

prefix to append. Either a str (to apply a constant prefix), dict mapping column names to prefixes, or sequence of prefixes to apply with the same length as the number of columns. If not supplied, defaults to the empty string

prefix_sepstr, dict, or sequence, optional, default ‘_’

separator to use when appending prefixes

dummy_naboolean, optional

Add a column to indicate Nones, if False Nones are ignored.

catsdict, optional

dictionary mapping column names to sequences of integers representing that column’s category. See cudf.DataFrame.one_hot_encoding for more information. if not supplied, it will be computed

sparseboolean, optional

Right now this is NON-FUNCTIONAL argument in rapids.

drop_firstboolean, optional

Right now this is NON-FUNCTIONAL argument in rapids.

columnssequence of str, optional

Names of columns to encode. If not provided, will attempt to encode all columns. Note this is different from pandas default behavior, which encodes all columns with dtype object or categorical

dtypestr, optional

output dtype, default ‘uint8’

Examples

>>> import cudf
>>> df = cudf.DataFrame({"a": ["value1", "value2", None], "b": [0, 0, 0]})
>>> cudf.get_dummies(df)
   b  a_value1  a_value2
0  0         1         0
1  0         0         1
2  0         0         0
>>> cudf.get_dummies(df, dummy_na=True)
   b  a_None  a_value1  a_value2
0  0       0         1         0
1  0       0         0         1
2  0       1         0         0
>>> import numpy as np
>>> df = cudf.DataFrame({"a":cudf.Series([1, 2, np.nan, None],
...                     nan_as_null=False)})
>>> df
      a
0   1.0
1   2.0
2   NaN
3  <NA>
>>> cudf.get_dummies(df, dummy_na=True, columns=["a"])
   a_1.0  a_2.0  a_nan  a_null
0      1      0      0       0
1      0      1      0       0
2      0      0      1       0
3      0      0      0       1
>>> series = cudf.Series([1, 2, None, 2, 4])
>>> series
0       1
1       2
2    <NA>
3       2
4       4
dtype: int64
>>> cudf.get_dummies(series, dummy_na=True)
null  1  2  4
0     0  1  0  0
1     0  0  1  0
2     1  0  0  0
3     0  0  1  0
4     0  0  0  1
cudf.core.reshape.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None)

Unpivots a DataFrame from wide format to long format, optionally leaving identifier variables set.

Parameters
frameDataFrame
id_varstuple, list, or ndarray, optional

Column(s) to use as identifier variables. default: None

value_varstuple, list, or ndarray, optional

Column(s) to unpivot. default: all columns that are not set as id_vars.

var_namescalar

Name to use for the variable column. default: frame.columns.name or ‘variable’

value_namestr

Name to use for the value column. default: ‘value’

Returns
outDataFrame

Melted result

Difference from pandas:
  • Does not support ‘col_level’ because cuDF does not have multi-index

Examples

>>> import cudf
>>> df = cudf.DataFrame({'A': ['a', 'b', 'c'],
...                      'B': [1, 3, 5],
...                      'C': [2, 4, 6]})
>>> df
   A  B  C
0  a  1  2
1  b  3  4
2  c  5  6
>>> cudf.melt(df, id_vars=['A'], value_vars=['B'])
   A variable  value
0  a        B      1
1  b        B      3
2  c        B      5
>>> cudf.melt(df, id_vars=['A'], value_vars=['B', 'C'])
   A variable  value
0  a        B      1
1  b        B      3
2  c        B      5
3  a        C      2
4  b        C      4
5  c        C      6

The names of ‘variable’ and ‘value’ columns can be customized:

>>> cudf.melt(df, id_vars=['A'], value_vars=['B'],
...         var_name='myVarname', value_name='myValname')
   A myVarname  myValname
0  a         B          1
1  b         B          3
2  c         B          5
cudf.core.reshape.merge_sorted(objs, keys=None, by_index=False, ignore_index=False, ascending=True, na_position='last')

Merge a list of sorted DataFrame or Series objects.

Dataframes/Series in objs list MUST be pre-sorted by columns listed in keys, or by the index (if by_index=True).

Parameters
objslist of DataFrame, Series, or Index
keyslist, default None

List of Column names to sort by. If None, all columns used (Ignored if index=True)

by_indexbool, default False

Use index for sorting. keys input will be ignored if True

ignore_indexbool, default False

Drop and ignore index during merge. Default range index will be used in the output dataframe.

ascendingbool, default True

Sorting is in ascending order, otherwise it is descending

na_position{‘first’, ‘last’}, default ‘last’

‘first’ nulls at the beginning, ‘last’ nulls at the end

Returns
A new, lexicographically sorted, DataFrame/Series.
cudf.core.reshape.pivot(data, index=None, columns=None, values=None)

Return reshaped DataFrame organized by the given index and column values.

Reshape data (produce a “pivot” table) based on column values. Uses unique values from specified index / columns to form axes of the resulting DataFrame.

Parameters
indexcolumn name, optional

Column used to construct the index of the result.

columnscolumn name, optional

Column used to construct the columns of the result.

valuescolumn name or list of column names, optional

Column(s) whose values are rearranged to produce the result. If not specified, all remaining columns of the DataFrame are used.

Returns
DataFrame

Examples

>>> a = cudf.DataFrame()
>>> a['a'] = [1, 1, 2, 2],
>>> a['b'] = ['a', 'b', 'a', 'b']
>>> a['c'] = [1, 2, 3, 4]
>>> a.pivot(index='a', columns='b')
   c
b  a  b
a
1  1  2
2  3  4

Pivot with missing values in result:

>>> a = cudf.DataFrame()
>>> a['a'] = [1, 1, 2]
>>> a['b'] = [1, 2, 3]
>>> a['c'] = ['one', 'two', 'three']
>>> a.pivot(index='a', columns='b')
          c
    b     1     2      3
    a
    1   one   two   <NA>
    2  <NA>  <NA>  three
cudf.core.reshape.unstack(df, level, fill_value=None)

Pivot one or more levels of the (necessarily hierarchical) index labels.

Pivots the specified levels of the index labels of df to the innermost levels of the columns labels of the result.

  • If the index of df has multiple levels, returns a Dataframe with specified level of the index pivoted to the column levels.

  • If the index of df has single level, returns a Series with all column levels pivoted to the index levels.

Parameters
dfDataFrame
levellevel name or index, list-like

Integer, name or list of such, specifying one or more levels of the index to pivot

fill_value

Non-functional argument provided for compatibility with Pandas.

Returns
Series or DataFrame

Examples

>>> df['a'] = [1, 1, 1, 2, 2]
>>> df['b'] = [1, 2, 3, 1, 2]
>>> df['c'] = [5, 6, 7, 8, 9]
>>> df['d'] = ['a', 'b', 'a', 'd', 'e']
>>> df = df.set_index(['a', 'b', 'd'])
>>> df
       c
a b d
1 1 a  5
  2 b  6
  3 a  7
2 1 d  8
  2 e  9

Unstacking level ‘a’:

>>> df.unstack('a')
        c
a       1     2
b d
1 a     5  <NA>
  d  <NA>     8
2 b     6  <NA>
  e  <NA>     9
3 a     7  <NA>

Unstacking level ‘d’ :

>>> df.unstack('d')
        c
d       a     b     d     e
a b
1 1     5  <NA>  <NA>  <NA>
  2  <NA>     6  <NA>  <NA>
  3     7  <NA>  <NA>  <NA>
2 1  <NA>  <NA>     8  <NA>
  2  <NA>  <NA>  <NA>     9

Unstacking multiple levels:

>>> df.unstack(['b', 'd'])
      c
b     1           2           3
d     a     d     b     e     a
a
1     5  <NA>     6  <NA>     7
2  <NA>     8  <NA>     9  <NA>

Unstacking single level index dataframe:

>>> df = cudf.DataFrame({('c', 1): [1, 2, 3], ('c', 2):[9, 8, 7]})
>>> df.unstack()
c  1  0    1
      1    2
      2    3
   2  0    9
      1    8
      2    7
dtype: int64
cudf.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, format=None, exact=True, unit='ns', infer_datetime_format=False, origin='unix', cache=True)

Convert argument to datetime.

Parameters
argint, float, str, datetime, list, tuple, 1-d array,

Series DataFrame/dict-like The object to convert to a datetime.

errors{‘ignore’, ‘raise’, ‘coerce’, ‘warn’}, default ‘raise’
  • If ‘raise’, then invalid parsing will raise an exception.

  • If ‘coerce’, then invalid parsing will be set as NaT.

  • If ‘warn’prints last exceptions as warnings and

    return the input.

  • If ‘ignore’, then invalid parsing will return the input.

dayfirstbool, default False

Specify a date parse order if arg is str or its list-likes. If True, parses dates with the day first, eg 10/11/12 is parsed as 2012-11-10. Warning: dayfirst=True is not strict, but will prefer to parse with day first (this is a known bug, based on dateutil behavior).

formatstr, default None

The strftime to parse time, eg “%d/%m/%Y”, note that “%f” will parse all the way up to nanoseconds. See strftime documentation for more information on choices: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior.

unitstr, default ‘ns’

The unit of the arg (D,s,ms,us,ns) denote the unit, which is an integer or float number. This will be based off the origin(unix epoch start). Example, with unit=’ms’ and origin=’unix’ (the default), this would calculate the number of milliseconds to the unix epoch start.

infer_datetime_formatbool, default False

If True and no format is given, attempt to infer the format of the datetime strings, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by ~5-10x.

Returns
datetime

If parsing succeeded. Return type depends on input: - list-like: DatetimeIndex - Series: Series of datetime64 dtype - scalar: Timestamp

Examples

Assembling a datetime from multiple columns of a DataFrame. The keys can be common abbreviations like [‘year’, ‘month’, ‘day’, ‘minute’, ‘second’, ‘ms’, ‘us’, ‘ns’]) or plurals of the same

>>> import cudf
>>> df = cudf.DataFrame({'year': [2015, 2016],
...                    'month': [2, 3],
...                    'day': [4, 5]})
>>> cudf.to_datetime(df)
0   2015-02-04
1   2016-03-05
dtype: datetime64[ns]
>>> cudf.to_datetime(1490195805, unit='s')
numpy.datetime64('2017-03-22T15:16:45.000000000')
>>> cudf.to_datetime(1490195805433502912, unit='ns')
numpy.datetime64('1780-11-20T01:02:30.494253056')
cudf.to_numeric(arg, errors='raise', downcast=None)

Convert argument into numerical types.

Parameters
argcolumn-convertible

The object to convert to numeric types

errors{‘raise’, ‘ignore’, ‘coerce’}, defaults ‘raise’

Policy to handle errors during parsing.

  • ‘raise’ will notify user all errors encountered.

  • ‘ignore’ will skip error and returns arg.

  • ‘coerce’ will leave invalid values as nulls.

downcast{‘integer’, ‘signed’, ‘unsigned’, ‘float’}, defaults None

If set, will try to down-convert the datatype of the parsed results to smallest possible type. For each downcast type, this method will determine the smallest possible dtype from the following sets:

  • {‘integer’, ‘signed’}: all integer types greater or equal to np.int8

  • {‘unsigned’}: all unsigned types greater or equal to np.uint8

  • {‘float’}: all floating types greater or equal to np.float32

Note that downcast behavior is decoupled from parsing. Errors encountered during downcast is raised regardless of errors parameter.

Returns
Series or ndarray

Depending on the input, if series is passed in, series is returned, otherwise ndarray

Notes

An important difference from pandas is that this function does not accept mixed numeric/non-numeric type sequences. For example [1, 'a']. A TypeError will be raised when such input is received, regardless of errors parameter.

Examples

>>> s = cudf.Series(['1', '2.0', '3e3'])
>>> cudf.to_numeric(s)
0       1.0
1       2.0
2    3000.0
dtype: float64
>>> cudf.to_numeric(s, downcast='float')
0       1.0
1       2.0
2    3000.0
dtype: float32
>>> cudf.to_numeric(s, downcast='signed')
0       1
1       2
2    3000
dtype: int16
>>> s = cudf.Series(['apple', '1.0', '3e3'])
>>> cudf.to_numeric(s, errors='ignore')
0    apple
1      1.0
2      3e3
dtype: object
>>> cudf.to_numeric(s, errors='coerce')
0      <NA>
1       1.0
2    3000.0
dtype: float64

Index

class cudf.core.index.Index(data=None, dtype=None, copy=False, name=None, tupleize_cols=True, **kwargs)

Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.

Parameters
dataarray-like (1-dimensional)/ DataFrame

If it is a DataFrame, it will return a MultiIndex

dtypeNumPy dtype (default: object)

If dtype is None, we find the dtype that best fits the data.

copybool

Make a copy of input data.

nameobject

Name to be stored in the index.

tupleize_colsbool (default: True)

When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.

Returns
Index

cudf Index

Examples

>>> import cudf
>>> cudf.Index([1, 2, 3], dtype="uint64", name="a")
UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]}))
MultiIndex([(1, 2),
            (2, 3)],
          names=['a', 'b'])
Attributes
empty

Indicator whether Index is empty.

gpu_values

View the data as a numba device array object

is_monotonic

Alias for is_monotonic_increasing.

is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

is_unique

Return if the index has unique values.

name

Returns the name of the Index.

names

Returns a tuple containing the name of the Index.

ndim

Dimension of the data.

nlevels

Number of levels.

shape

Returns a tuple representing the dimensionality of the Index.

size

Return the number of elements in the underlying data.

values

Return an array representing the data in the Index.

values_host

Return a numpy representation of the Index.

Methods

acos()

Get Trigonometric inverse cosine, element-wise.

any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

argsort([ascending])

Return the integer indices that would sort the index.

asin()

Get Trigonometric inverse sine, element-wise.

astype(dtype[, copy])

Create an Index with values cast to dtypes.

atan()

Get Trigonometric inverse tangent, element-wise.

clip([lower, upper, inplace, axis])

Trim values at input threshold(s).

copy([deep])

Make a copy of this object’s indices and data.

cos()

Get Trigonometric cosine, element-wise.

difference(other[, sort])

Return a new Index with elements from the index that are not in other.

drop_duplicates([keep])

Return Index with duplicate values removed

dropna([how])

Return an Index with null values removed.

equals(other, **kwargs)

Determine if two Index objects contain the same elements.

exp()

Get the exponential of all elements, element-wise.

factorize([na_sentinel])

Encode the input values as integer labels

fillna(value[, downcast])

Fill null values with the specified value.

from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

from_pandas(index[, nan_as_null])

Convert from a Pandas Index.

get_level_values(level)

Return an Index of values for requested level.

get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label.

interleave_columns()

Interleave Series columns of a table into a single column.

isin(values)

Return a boolean array where the index values are in values.

isna()

Identify missing values.

isnull()

Identify missing values.

join(other[, how, level, return_indexers, sort])

Compute join_index and indexers to conform data structures to the new index.

log()

Get the natural logarithm of all elements, element-wise.

mask(cond[, other, inplace])

Replace values where the condition is True.

max()

Return the maximum value of the Index.

memory_usage([deep])

Memory usage of the values.

min()

Return the minimum value of the Index.

notna()

Identify non-missing values.

notnull()

Identify non-missing values.

pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

rank([axis, method, numeric_only, …])

Compute numerical data ranks (1 through n) along axis.

rename(name[, inplace])

Alter Index name.

repeat(repeats[, axis])

Repeats elements consecutively.

round([decimals])

Round a DataFrame to a variable number of decimal places.

sample([n, frac, replace, weights, …])

Return a random sample of items from an axis of object.

scatter_by_map(map_index[, map_size, keep_index])

Scatter to a list of dataframes.

searchsorted(values[, side, ascending, …])

Find indices where elements should be inserted to maintain order

set_names(names[, level, inplace])

Set Index or MultiIndex name.

shift([periods, freq, axis, fill_value])

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

sort_values([return_indexer, ascending, key])

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

sqrt()

Get the non-negative square-root of all elements, element-wise.

sum()

Return the sum of all values of the Index.

take(indices)

Gather only the specific subset of indices

tan()

Get Trigonometric tangent, element-wise.

tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

to_array([fillna])

Get a dense numpy array for the data.

to_arrow()

Convert Index to PyArrow Array

to_dlpack()

Converts a cuDF object into a DLPack tensor.

to_pandas()

Convert to a Pandas Index.

to_series([index, name])

Create a Series with both index and values equal to the index keys.

unique()

Return unique values in the index.

where(cond[, other])

Replace values where the condition is False.

replace

acos()

Get Trigonometric inverse cosine, element-wise.

The inverse of cos so that, if y = x.cos(), then x = y.acos()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.acos()
0    3.141593
1    1.570796
2    0.000000
3    1.240482
4    1.047198
dtype: float64

acos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.acos()
      first    second
0  3.141593  1.334606
1  1.570796  1.266104
2  1.047198  1.470629

acos operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.acos()
Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0,
            1.5707963267948966,  1.266103672779499],
            dtype='float64')
any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

Parameters
otherIndex or list/tuple of indices
Returns
appendedIndex

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 10, 100])
>>> idx
Int64Index([1, 2, 10, 100], dtype='int64')
>>> other = cudf.Index([200, 400, 50])
>>> other
Int64Index([200, 400, 50], dtype='int64')
>>> idx.append(other)
Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')

append accepts list of Index objects

>>> idx.append([other, other])
Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
argsort(ascending=True, **kwargs)

Return the integer indices that would sort the index.

Parameters
ascendingbool, default True

If True, returns the indices for ascending order. If False, returns the indices for descending order.

Returns
arrayA cupy array containing Integer indices that

would sort the index if used as an indexer.

Examples

>>> import cudf
>>> index = cudf.Index([10, 100, 1, 1000])
>>> index
Int64Index([10, 100, 1, 1000], dtype='int64')
>>> index.argsort()
array([2, 0, 1, 3], dtype=int32)

The order of argsort can be reversed using ascending parameter, by setting it to False. >>> index.argsort(ascending=False) array([3, 1, 0, 2], dtype=int32)

argsort on a MultiIndex:

>>> index = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> index
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> index.argsort()
array([4, 0, 1, 2, 3], dtype=int32)
>>> index.argsort(ascending=False)
array([3, 2, 1, 0, 4], dtype=int32)
asin()

Get Trigonometric inverse sine, element-wise.

The inverse of sine so that, if y = x.sin(), then x = y.asin()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.asin()
0   -1.570796
1    0.000000
2    1.570796
3    0.330314
4    0.523599
dtype: float64

asin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.asin()
      first    second
0 -1.570796  0.236190
1  0.000000  0.304693
2  0.523599  0.100167

asin operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64')
>>> index.asin()
Float64Index([-1.5707963267948966, 0.41151684606748806,
            1.5707963267948966, 0.3046926540153975],
            dtype='float64')
astype(dtype, copy=False)

Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.

Parameters
dtypenumpy dtype

Use a numpy.dtype to cast entire Index object to.

copybool, default False

By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.

Returns
Index

Index with values cast to specified dtype.

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3])
>>> index
Int64Index([1, 2, 3], dtype='int64')
>>> index.astype('float64')
Float64Index([1.0, 2.0, 3.0], dtype='float64')
atan()

Get Trigonometric inverse tangent, element-wise.

The inverse of tan so that, if y = x.tan(), then x = y.atan()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10])
>>> ser
0    -1.00000
1     0.00000
2     1.00000
3     0.32434
4     0.50000
5   -10.00000
dtype: float64
>>> ser.atan()
0   -0.785398
1    0.000000
2    0.785398
3    0.313635
4    0.463648
5   -1.471128
dtype: float64

atan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.atan()
      first    second
0 -0.785398  0.229864
1 -1.471128  0.291457
2  0.463648  1.471128

atan operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.atan()
Float64Index([-0.7853981633974483,  0.3805063771123649,
                            0.7853981633974483, 0.0,
                            0.2914567944778671],
            dtype='float64')
clip(lower=None, upper=None, inplace=False, axis=1)

Trim values at input threshold(s).

Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.

Parameters
lowerscalar or array_like, default None

Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.

upperscalar or array_like, default None

Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.

inplacebool, default False
Returns
Clipped DataFrame/Series/Index/MultiIndex

Examples

>>> import cudf
>>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']})
>>> df.clip(lower=[2, 'b'], upper=[3, 'c'])
   a  b
0  2  b
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=None, upper=[3, 'c'])
   a  b
0  1  a
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=[2, 'b'], upper=None)
   a  b
0  2  b
1  2  b
2  3  c
3  4  d
>>> df.clip(lower=2, upper=3, inplace=True)
>>> df
   a  b
0  2  2
1  2  3
2  3  3
3  3  3
>>> import cudf
>>> sr = cudf.Series([1, 2, 3, 4])
>>> sr.clip(lower=2, upper=3)
0    2
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=None, upper=3)
0    1
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True)
>>> sr
0    2
1    2
2    3
3    4
dtype: int64
copy(deep: bool = True)T

Make a copy of this object’s indices and data.

When deep=True (default), a new object will be created with a copy of the calling object’s data and indices. Modifications to the data or indices of the copy will not be reflected in the original object (see notes below). When deep=False, a new object will be created without copying the calling object’s data or index (only references to the data and index are copied). Any changes to the data of the original will be reflected in the shallow copy (and vice versa).

Parameters
deepbool, default True

Make a deep copy, including a copy of the data and the indices. With deep=False neither the indices nor the data are copied.

Returns
copySeries or DataFrame

Object type matches caller.

Examples

>>> s = cudf.Series([1, 2], index=["a", "b"])
>>> s
a    1
b    2
dtype: int64
>>> s_copy = s.copy()
>>> s_copy
a    1
b    2
dtype: int64

Shallow copy versus default (deep) copy:

>>> s = cudf.Series([1, 2], index=["a", "b"])
>>> deep = s.copy()
>>> shallow = s.copy(deep=False)

Shallow copy shares data and index with original.

>>> s is shallow
False
>>> s._column is shallow._column and s.index is shallow.index
True

Deep copy has own copy of data and index.

>>> s is deep
False
>>> s.values is deep.values or s.index is deep.index
False

Updates to the data shared by shallow copy and original is reflected in both; deep copy remains unchanged.

>>> s['a'] = 3
>>> shallow['b'] = 4
>>> s
a    3
b    4
dtype: int64
>>> shallow
a    3
b    4
dtype: int64
>>> deep
a    1
b    2
dtype: int64
cos()

Get Trigonometric cosine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.cos()
0    1.000000
1    0.947861
2    0.877583
3    0.525322
4   -0.448074
5   -0.598460
6   -0.283691
dtype: float64

cos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.cos()
      first    second
0  1.000000  0.862319
1  0.283662 -0.283691
2 -0.839072 -0.839039
3 -0.759688 -0.022097

cos operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.cos()
Float64Index([ 0.9210609940028851,  0.8623188722876839,
            -0.5984600690578581, -0.4480736161291701],
            dtype='float64')
difference(other, sort=None)

Return a new Index with elements from the index that are not in other.

This is the set difference of two Index objects.

Parameters
otherIndex or array-like
sortFalse or None, default None

Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.

  • None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.

  • False : Do not sort the result.

Returns
differenceIndex

Examples

>>> import cudf
>>> idx1 = cudf.Index([2, 1, 3, 4])
>>> idx1
Int64Index([2, 1, 3, 4], dtype='int64')
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx2
Int64Index([3, 4, 5, 6], dtype='int64')
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
drop_duplicates(keep='first')

Return Index with duplicate values removed

Parameters
keep{‘first’, ‘last’, False}, default ‘first’
  • ‘first’Drop duplicates except for the

    first occurrence.

  • ‘last’Drop duplicates except for the

    last occurrence.

  • False : Drop all duplicates.

Returns
deduplicatedIndex

Examples

>>> import cudf
>>> idx = cudf.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
>>> idx
StringIndex(['lama' 'cow' 'lama' 'beetle' 'lama' 'hippo'], dtype='object')
>>> idx.drop_duplicates()
StringIndex(['beetle' 'cow' 'hippo' 'lama'], dtype='object')
dropna(how='any')

Return an Index with null values removed.

Parameters
how{‘any’, ‘all’}, default ‘any’

If the Index is a MultiIndex, drop the value when any or all levels are NaN.

Returns
validIndex

Examples

>>> import cudf
>>> index = cudf.Index(['a', None, 'b', 'c'])
>>> index
StringIndex(['a' None 'b' 'c'], dtype='object')
>>> index.dropna()
StringIndex(['a' 'b' 'c'], dtype='object')

Using dropna on a MultiIndex:

>>> midx = cudf.MultiIndex(
...         levels=[[1, None, 4, None], [1, 2, 5]],
...         codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...         names=["x", "y"],
...     )
>>> midx
MultiIndex([(   1, 1),
            (   1, 5),
            (<NA>, 2),
            (   4, 2),
            (<NA>, 1)],
           names=['x', 'y'])
>>> midx.dropna()
MultiIndex([(1, 1),
            (1, 5),
            (4, 2)],
           names=['x', 'y'])
property empty

Indicator whether Index is empty.

True if Index is entirely empty (no items).

Returns
outbool

If Index is empty, return True, if not return False.

Examples

>>> import cudf
>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.empty
True
equals(other, **kwargs)

Determine if two Index objects contain the same elements.

Returns
out: bool

True if “other” is an Index and it has the same elements as calling index; False otherwise.

exp()

Get the exponential of all elements, element-wise.

Exponential is the inverse of the log function, so that x.exp().log() = x

Returns
DataFrame/Series/Index

Result of the element-wise exponential.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.exp()
0    3.678794e-01
1    1.000000e+00
2    2.718282e+00
3    1.383117e+00
4    1.648721e+00
5    4.539993e-05
6    2.688117e+43
dtype: float64

exp operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.exp()
      first        second
0  0.367879      1.263644
1  0.000045      1.349859
2  1.648721  22026.465795

exp operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.exp()
Float64Index([0.36787944117144233,  1.4918246976412703,
              2.718281828459045, 1.0,  1.3498588075760032],
            dtype='float64')
factorize(na_sentinel=- 1)

Encode the input values as integer labels

See also

cudf.core.series.Series.factorize

Encode the input values of Series.

fillna(value, downcast=None)

Fill null values with the specified value.

Parameters
valuescalar

Scalar value to use to fill nulls. This value cannot be a list-likes.

downcastdict, default is None

This Parameter is currently NON-FUNCTIONAL.

Returns
filledIndex

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, None, 4])
>>> index
Int64Index([1, 2, null, 4], dtype='int64')
>>> index.fillna(3)
Int64Index([1, 2, 3, 4], dtype='int64')
classmethod from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

Parameters
arrayPyArrow Array/ChunkedArray

PyArrow Object which has to be converted to Index

Returns
cudf Index
Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pyarrow as pa
>>> cudf.Index.from_arrow(pa.array(["a", "b", None]))
StringIndex(['a' 'b' None], dtype='object')
classmethod from_pandas(index, nan_as_null=None)

Convert from a Pandas Index.

Parameters
indexPandas Index object

A Pandas Index object which has to be converted to cuDF Index.

nan_as_nullbool, Default None

If None/True, converts np.nan values to null values. If False, leaves np.nan values as is.

Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pandas as pd
>>> import numpy as np
>>> data = [10, 20, 30, np.nan]
>>> pdi = pd.Index(data)
>>> cudf.core.index.Index.from_pandas(pdi)
Float64Index([10.0, 20.0, 30.0, <NA>], dtype='float64')
>>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False)
Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
get_level_values(level)

Return an Index of values for requested level.

This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.

Parameters
levelint or str

It is either the integer position or the name of the level.

Returns
Index

Calling object, as there is only one level in the Index.

See also

cudf.core.multiindex.MultiIndex.get_level_values

Get values for a level of a MultiIndex.

Notes

For Index, level should be 0, since there are no multiple levels.

Examples

>>> import cudf
>>> idx = cudf.Index(["a", "b", "c"])
>>> idx.get_level_values(0)
StringIndex(['a' 'b' 'c'], dtype='object')
get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if side=='right') position of given label.

Parameters
labelobject
side{‘left’, ‘right’}
kind{‘ix’, ‘loc’, ‘getitem’}
Returns
int

Index of label.

property gpu_values

View the data as a numba device array object

interleave_columns()

Interleave Series columns of a table into a single column.

Converts the column major table cols into a row major column.

Parameters
colsinput Table containing columns to interleave.
Returns
The interleaved columns as a single column

Examples

>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']])
>>> df
0    [A1, A2, A3]
1    [B1, B2, B3]
>>> df.interleave_columns()
0    A1
1    B1
2    A2
3    B2
4    A3
5    B3
property is_monotonic

Alias for is_monotonic_increasing.

property is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

property is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

property is_unique

Return if the index has unique values.

isin(values)

Return a boolean array where the index values are in values.

Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.

Parameters
valuesset, list-like, Index

Sought values.

Returns
is_containedcupy array

CuPy array of boolean values.

Examples

>>> idx = cudf.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')

Check whether each index value in a list of values.

>>> idx.isin([1, 4])
array([ True, False, False])
isna()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
isnull()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
join(other, how='left', level=None, return_indexers=False, sort=False)

Compute join_index and indexers to conform data structures to the new index.

Parameters
otherIndex.
how{‘left’, ‘right’, ‘inner’, ‘outer’}
return_indexersbool, default False
sortbool, default False

Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).

Returns: index

Examples

>>> import cudf
>>> lhs = cudf.DataFrame(
...     {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b']
... ).index
>>> lhs
MultiIndex([(2, 3),
            (3, 4),
            (1, 2)],
           names=['a', 'b'])
>>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index
>>> rhs
Int64Index([1, 4, 3], dtype='int64', name='a')
>>> lhs.join(rhs, how='inner')
MultiIndex([(3, 4),
            (1, 2)],
           names=['a', 'b'])
log()

Get the natural logarithm of all elements, element-wise.

Natural logarithm is the inverse of the exp function, so that x.log().exp() = x

Returns
DataFrame/Series/Index

Result of the element-wise natural logarithm.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.log()
0         NaN
1        -inf
2    0.000000
3   -1.125963
4   -0.693147
5         NaN
6    4.605170
dtype: float64

log operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.log()
      first    second
0       NaN -1.452434
1       NaN -1.203973
2 -0.693147  2.302585

log operation on Index:

>>> index = cudf.Index([10, 11, 500.0])
>>> index
Float64Index([10.0, 11.0, 500.0], dtype='float64')
>>> index.log()
Float64Index([2.302585092994046, 2.3978952727983707,
            6.214608098422191], dtype='float64')
mask(cond, other=None, inplace=False)

Replace values where the condition is True.

Parameters
condbool Series/DataFrame, array-like

Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.

other: scalar, list of scalars, Series/DataFrame

Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.

DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.

Series expects only scalar or series like with same length

inplacebool, default False

Whether to perform the operation in place on the data.

Returns
Same type as caller

Examples

>>> import cudf
>>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]})
>>> df.mask(df % 2 == 0, [-1, -1])
   A  B
0  1  3
1 -1  5
2  5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.mask(ser > 2, 10)
0    10
1    10
2     2
3     1
4     0
dtype: int64
>>> ser.mask(ser > 2)
0    <NA>
1    <NA>
2       2
3       1
4       0
dtype: int64
max()

Return the maximum value of the Index.

Returns
scalar

Maximum value.

See also

cudf.core.index.Index.min

Return the minimum value in an Index.

cudf.core.series.Series.max

Return the maximum value in a Series.

cudf.core.dataframe.DataFrame.max

Return the maximum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.max()
3
memory_usage(deep=False)

Memory usage of the values.

Parameters
deepbool

Introspect the data deeply, interrogate object dtypes for system-level memory consumption.

Returns
bytes used
min()

Return the minimum value of the Index.

Returns
scalar

Minimum value.

See also

cudf.core.index.Index.max

Return the maximum value in an Index.

cudf.core.series.Series.min

Return the minimum value in a Series.

cudf.core.dataframe.DataFrame.min

Return the minimum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.min()
1
property name

Returns the name of the Index.

property names

Returns a tuple containing the name of the Index.

property ndim

Dimension of the data. Apart from MultiIndex ndim is always 1.

property nlevels

Number of levels.

notna()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
notnull()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

Parameters
funcfunction

Function to apply to the Series/DataFrame/Index. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame/Index.

argsiterable, optional

Positional arguments passed into func.

kwargsmapping, optional

A dictionary of keyword arguments passed into func.

Returns
objectthe return type of func.

Examples

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. Instead of writing

>>> func(g(h(df), arg1=a), arg2=b, arg3=c)

You can write

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe(func, arg2=b, arg3=c)
... )

If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose f takes its data as arg2:

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe((func, 'arg2'), arg1=a, arg3=c)
...  )
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)

Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.

Parameters
axis{0 or ‘index’, 1 or ‘columns’}, default 0

Index to direct ranking.

method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’

How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.

numeric_onlybool, optional

For DataFrame objects, rank only numeric columns if set to True.

na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’

How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.

ascendingbool, default True

Whether or not the elements should be ranked in ascending order.

pctbool, default False

Whether or not to display the returned rankings in percentile form.

Returns
same type as caller

Return a Series or DataFrame with data ranks as values.

rename(name, inplace=False)

Alter Index name.

Defaults to returning new index.

Parameters
namelabel

Name(s) to set.

Returns
Index

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3], name='one')
>>> index
Int64Index([1, 2, 3], dtype='int64', name='one')
>>> index.name
'one'
>>> renamed_index = index.rename('two')
>>> renamed_index
Int64Index([1, 2, 3], dtype='int64', name='two')
>>> renamed_index.name
'two'
repeat(repeats, axis=None)

Repeats elements consecutively.

Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.

Parameters
repeatsint, or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.

Returns
Series/DataFrame/Index

A newly created object of same type as caller with repeated elements.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]})
>>> df
   a   b
0  1  10
1  2  20
2  3  30
>>> df.repeat(3)
   a   b
0  1  10
0  1  10
0  1  10
1  2  20
1  2  20
1  2  20
2  3  30
2  3  30
2  3  30

Repeat on Series

>>> s = cudf.Series([0, 2])
>>> s
0    0
1    2
dtype: int64
>>> s.repeat([3, 4])
0    0
0    0
0    0
1    2
1    2
1    2
1    2
dtype: int64
>>> s.repeat(2)
0    0
0    0
1    2
1    2
dtype: int64

Repeat on Index

>>> index = cudf.Index([10, 22, 33, 55])
>>> index
Int64Index([10, 22, 33, 55], dtype='int64')
>>> index.repeat(5)
Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33,
            33, 33, 33, 33, 55, 55, 55, 55, 55],
        dtype='int64')
round(decimals=0)

Round a DataFrame to a variable number of decimal places.

Parameters
decimalsint, dict, Series

Number of decimal places to round each column to. If an int is given, round each column to the same number of places. Otherwise dict and Series round to variable numbers of places. Column names should be in the keys if decimals is a dict-like, or in the index if decimals is a Series. Any columns not included in decimals will be left as is. Elements of decimals which are not columns of the input will be ignored.

Returns
DataFrame

A DataFrame with the affected columns rounded to the specified number of decimal places.

Examples

>>> df = cudf.DataFrame(
        [(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
...     columns=['dogs', 'cats']
... )
>>> df
    dogs  cats
0  0.21  0.32
1  0.01  0.67
2  0.66  0.03
3  0.21  0.18

By providing an integer each column is rounded to the same number of decimal places

>>> df.round(1)
    dogs  cats
0   0.2   0.3
1   0.0   0.7
2   0.7   0.0
3   0.2   0.2

With a dict, the number of places for specific columns can be specified with the column names as key and the number of decimal places as value

>>> df.round({'dogs': 1, 'cats': 0})
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

Using a Series, the number of places for specific columns can be specified with the column names as index and the number of decimal places as value

>>> decimals = cudf.Series([0, 1], index=['cats', 'dogs'])
>>> df.round(decimals)
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters
nint, optional

Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

fracfloat, optional

Fraction of axis items to return. Cannot be used with n.

replacebool, default False

Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”

weightsstr or ndarray-like, optional

Only supported for axis=1/”columns”

random_stateint, numpy RandomState or None, default None

Seed for the random number generator (if int), or None. If None, a random seed will be chosen. if RandomState, seed will be extracted from current state.

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.

Returns
Series or DataFrame or Index

A new object of same type as caller containing n items randomly sampled from the caller object.

Examples

>>> import cudf as cudf
>>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}})
>>> df.sample(3)
   a
1  2
3  4
0  1
>>> sr = cudf.Series([1, 2, 3, 4, 5])
>>> sr.sample(10, replace=True)
1    4
3    1
2    4
0    5
0    1
4    5
4    1
0    2
0    3
3    2
dtype: int64
>>> df = cudf.DataFrame(
... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]})
>>> df.sample(2, axis=1)
   a  c
0  1  3
1  2  4
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)

Scatter to a list of dataframes.

Uses map_index to determine the destination of each row of the original DataFrame.

Parameters
map_indexSeries, str or list-like

Scatter assignment for each row

map_sizeint

Length of output list. Must be >= uniques in map_index

keep_indexbool

Conserve original index values for each row

Returns
A list of cudf.DataFrame objects.
searchsorted(values, side='left', ascending=True, na_position='last')

Find indices where elements should be inserted to maintain order

Parameters
valueFrame (Shape must be consistent with self)

Values to be hypothetically inserted into Self

sidestr {‘left’, ‘right’} optional, default ‘left‘

If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index

ascendingbool optional, default True

Sorted Frame is in ascending order (otherwise descending)

na_positionstr {‘last’, ‘first’} optional, default ‘last‘

Position of null values in sorted order

Returns
1-D cupy array of insertion points

Examples

>>> s = cudf.Series([1, 2, 3])
>>> s.searchsorted(4)
3
>>> s.searchsorted([0, 4])
array([0, 3], dtype=int32)
>>> s.searchsorted([1, 3], side='left')
array([0, 2], dtype=int32)
>>> s.searchsorted([1, 3], side='right')
array([1, 3], dtype=int32)

If the values are not monotonically sorted, wrong locations may be returned:

>>> s = cudf.Series([2, 1, 3])
>>> s.searchsorted(1)
0   # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]})
>>> df
   a   b
0  1  10
1  3  12
2  5  14
3  7  16
>>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6],
... 'b': [10, 11, 13, 15]})
>>> values_df
   a   b
0  0  10
1  2  17
2  5  13
3  6  15
>>> df.searchsorted(values_df, ascending=False)
array([4, 4, 4, 0], dtype=int32)
set_names(names, level=None, inplace=False)

Set Index or MultiIndex name. Able to set new names partially and by level.

Parameters
nameslabel or list of label

Name(s) to set.

levelint, label or list of int or label, optional

If the index is a MultiIndex, level(s) to set (None for all levels). Otherwise level must be None.

inplacebool, default False

Modifies the object directly, instead of creating a new Index or MultiIndex.

Returns
Index

The same type as the caller or None if inplace is True.

See also

cudf.core.index.Index.rename

Able to set new names without level.

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = cudf.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
            ('python', 2019),
            ( 'cobra', 2018),
            ( 'cobra', 2019)],
           )
>>> idx.names
FrozenList([None, None])
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx.names
FrozenList(['kind', 'year'])
>>> idx.set_names('species', level=0, inplace=True)
>>> idx.names
FrozenList(['species', 'year'])
property shape

Returns a tuple representing the dimensionality of the Index.

shift(periods=1, freq=None, axis=0, fill_value=None)

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.sin()
0    0.000000
1    0.318683
2    0.479426
3    0.850904
4    0.893997
5   -0.801153
6    0.958916
dtype: float64

sin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.sin()
      first    second
0  0.000000 -0.506366
1 -0.958924  0.958916
2 -0.544021 -0.544072
3  0.650288 -0.999756

sin operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.sin()
Float64Index([-0.3894183423086505, -0.5063656411097588,
            0.8011526357338306, 0.8939966636005579],
            dtype='float64')
property size

Return the number of elements in the underlying data.

Returns
sizeSize of the DataFrame / Index / Series / MultiIndex

Examples

Size of an empty dataframe is 0.

>>> import cudf
>>> df = cudf.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>> df.size
0
>>> df = cudf.DataFrame(index=[1, 2, 3])
>>> df
Empty DataFrame
Columns: []
Index: [1, 2, 3]
>>> df.size
0

DataFrame with values

>>> df = cudf.DataFrame({'a': [10, 11, 12],
...         'b': ['hello', 'rapids', 'ai']})
>>> df
    a       b
0  10   hello
1  11  rapids
2  12      ai
>>> df.size
6
>>> df.index
RangeIndex(start=0, stop=3)
>>> df.index.size
3

Size of an Index

>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.size
0
>>> index = cudf.Index([1, 2, 3, 10])
>>> index
Int64Index([1, 2, 3, 10], dtype='int64')
>>> index.size
4

Size of a MultiIndex

>>> midx = cudf.MultiIndex(
...                 levels=[["a", "b", "c", None], ["1", None, "5"]],
...                 codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...                 names=["x", "y"],
...             )
>>> midx
MultiIndex([( 'a',  '1'),
            ( 'a',  '5'),
            ( 'b', <NA>),
            ( 'c', <NA>),
            (<NA>,  '1')],
           names=['x', 'y'])
>>> midx.size
5
sort_values(return_indexer=False, ascending=True, key=None)

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

Parameters
return_indexerbool, default False

Should the indices that would sort the index be returned.

ascendingbool, default True

Should the index values be sorted in an ascending order.

keyNone, optional

This parameter is NON-FUNCTIONAL.

Returns
sorted_indexIndex

Sorted copy of the index.

indexercupy.ndarray, optional

The indices that the index itself was sorted by.

See also

cudf.core.series.Series.min

Sort values of a Series.

cudf.core.dataframe.DataFrame.sort_values

Sort values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')

Sort values in ascending order (default behavior).

>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')

Sort values in descending order, and also get the indices idx was sorted by.

>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2],
                                                    dtype=int32))

Sorting values in a MultiIndex:

>>> midx = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> midx
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> midx.sort_values()
MultiIndex([(-10,  1),
            (  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11)],
           names=['x', 'y'])
>>> midx.sort_values(ascending=False)
MultiIndex([(  4, 11),
            (  3, 11),
            (  1,  5),
            (  1,  1),
            (-10,  1)],
           names=['x', 'y'])
sqrt()

Get the non-negative square-root of all elements, element-wise.

Returns
DataFrame/Series/Index

Result of the non-negative square-root of each element.

Examples

>>> import cudf
>>> import cudf
>>> ser = cudf.Series([10, 25, 81, 1.0, 100])
>>> ser
0     10.0
1     25.0
2     81.0
3      1.0
4    100.0
dtype: float64
>>> ser.sqrt()
0     3.162278
1     5.000000
2     9.000000
3     1.000000
4    10.000000
dtype: float64

sqrt operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-10.0, 100, 625],
...                      'second': [1, 2, 0.4]})
>>> df
   first  second
0  -10.0     1.0
1  100.0     2.0
2  625.0     0.4
>>> df.sqrt()
   first    second
0    NaN  1.000000
1   10.0  1.414214
2   25.0  0.632456

sqrt operation on Index:

>>> index = cudf.Index([-10.0, 100, 625])
>>> index
Float64Index([-10.0, 100.0, 625.0], dtype='float64')
>>> index.sqrt()
Float64Index([nan, 10.0, 25.0], dtype='float64')
sum()

Return the sum of all values of the Index.

Returns
scalar

Sum of all values.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.sum()
6
take(indices)

Gather only the specific subset of indices

Parameters
indices: An array-like that maps to values contained in this Index.
tan()

Get Trigonometric tangent, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.tan()
0    0.000000
1    0.336213
2    0.546302
3    1.619775
4   -1.995200
5    1.338690
6   -3.380140
dtype: float64

tan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.tan()
      first     second
0  0.000000  -0.587214
1 -3.380515  -3.380140
2  0.648361   0.648446
3 -0.855993  45.244742

tan operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.tan()
Float64Index([-0.4227932187381618,  -0.587213915156929,
            -1.3386902103511544, -1.995200412208242],
            dtype='float64')
tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

Parameters
selfinput Table containing columns to interleave.
countNumber of times to tile “rows”. Must be non-negative.
Returns
The table containing the tiled “rows”.

Examples

>>> df  = Dataframe([[8, 4, 7], [5, 2, 3]])
>>> count = 2
>>> df.tile(df, count)
   0  1  2
0  8  4  7
1  5  2  3
0  8  4  7
1  5  2  3
to_array(fillna=None)

Get a dense numpy array for the data.

Parameters
fillnastr or None

Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.

Notes

if fillna is None, null values are skipped. Therefore, the output size could be smaller.

to_arrow()

Convert Index to PyArrow Array

Returns
PyArrow Array

Examples

>>> import cudf
>>> ind = cudf.Index(["a", "b", None])
>>> ind.to_arrow()
<pyarrow.lib.StringArray object at 0x7f796b0e7750>
[
  "a",
  "b",
  null
]
to_dlpack()

Converts a cuDF object into a DLPack tensor.

DLPack is an open-source memory tensor structure: dmlc/dlpack.

This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.

Parameters
cudf_objDataFrame, Series, Index, or Column
Returns
pycapsule_objPyCapsule

Output DLPack tensor pointer which is encapsulated in a PyCapsule object.

to_pandas()

Convert to a Pandas Index.

Examples

>>> import cudf
>>> idx = cudf.Index([-3, 10, 15, 20])
>>> idx
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> idx.to_pandas()
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> type(idx.to_pandas())
<class 'pandas.core.indexes.numeric.Int64Index'>
>>> type(idx)
<class 'cudf.core.index.GenericIndex'>
to_series(index=None, name=None)

Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.

Parameters
indexIndex, optional

Index of resulting Series. If None, defaults to original index.

namestr, optional

Dame of resulting Series. If None, defaults to name of original index.

Returns
Series

The dtype will be based on the type of the Index values.

unique()

Return unique values in the index.

Returns
Index without duplicates
property values

Return an array representing the data in the Index.

Returns
arrayA cupy array of data in the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values
array([  1, -10, 100,  20])
>>> type(index.values)
<class 'cupy.core.core.ndarray'>
property values_host

Return a numpy representation of the Index.

Only the values in the Index will be returned.

Returns
outnumpy.ndarray

The values of the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values_host
array([  1, -10, 100,  20])
>>> type(index.values_host)
<class 'numpy.ndarray'>
where(cond, other=None)

Replace values where the condition is False.

Parameters
condbool array-like with the same length as self

Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.

other: scalar, or array-like

Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.

Returns
Same type as caller

Examples

>>> import cudf
>>> index = cudf.Index([4, 3, 2, 1, 0])
>>> index
Int64Index([4, 3, 2, 1, 0], dtype='int64')
>>> index.where(index > 2, 15)
Int64Index([4, 3, 15, 15, 15], dtype='int64')

RangeIndex

class cudf.core.index.RangeIndex(start, stop=None, step=1, dtype=None, copy=False, name=None)

Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.

Parameters
dataarray-like (1-dimensional)/ DataFrame

If it is a DataFrame, it will return a MultiIndex

dtypeNumPy dtype (default: object)

If dtype is None, we find the dtype that best fits the data.

copybool

Make a copy of input data.

nameobject

Name to be stored in the index.

tupleize_colsbool (default: True)

When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.

Returns
Index

cudf Index

Examples

>>> import cudf
>>> cudf.Index([1, 2, 3], dtype="uint64", name="a")
UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]}))
MultiIndex([(1, 2),
            (2, 3)],
          names=['a', 'b'])
Attributes
dtype

dtype of the range of values in RangeIndex.

empty

Indicator whether Index is empty.

gpu_values

View the data as a numba device array object

is_contiguous

Returns if the index is contiguous.

is_monotonic

Alias for is_monotonic_increasing.

is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

is_unique

Return if the index has unique values.

name

Returns the name of the Index.

names

Returns a tuple containing the name of the Index.

ndim

Dimension of the data.

nlevels

Number of levels.

shape

Returns a tuple representing the dimensionality of the Index.

size

Return the number of elements in the underlying data.

start

The value of the start parameter (0 if this was not supplied).

step

The value of the step parameter.

stop

The value of the stop parameter.

values

Return an array representing the data in the Index.

values_host

Return a numpy representation of the Index.

Methods

acos()

Get Trigonometric inverse cosine, element-wise.

any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

argsort([ascending])

Return the integer indices that would sort the index.

asin()

Get Trigonometric inverse sine, element-wise.

astype(dtype[, copy])

Create an Index with values cast to dtypes.

atan()

Get Trigonometric inverse tangent, element-wise.

clip([lower, upper, inplace, axis])

Trim values at input threshold(s).

copy([name, deep, dtype, names])

Make a copy of this object.

cos()

Get Trigonometric cosine, element-wise.

difference(other[, sort])

Return a new Index with elements from the index that are not in other.

drop_duplicates([keep])

Return Index with duplicate values removed

dropna([how])

Return an Index with null values removed.

equals(other)

Determine if two Index objects contain the same elements.

exp()

Get the exponential of all elements, element-wise.

factorize([na_sentinel])

Encode the input values as integer labels

fillna(value[, downcast])

Fill null values with the specified value.

find_label_range([first, last])

Find subrange in the RangeIndex, marked by their positions, that starts greater or equal to first and ends less or equal to last

from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

from_pandas(index[, nan_as_null])

Convert from a Pandas Index.

get_level_values(level)

Return an Index of values for requested level.

get_slice_bound(label, side[, kind])

Calculate slice bound that corresponds to given label.

interleave_columns()

Interleave Series columns of a table into a single column.

isin(values)

Return a boolean array where the index values are in values.

isna()

Identify missing values.

isnull()

Identify missing values.

join(other[, how, level, return_indexers, sort])

Compute join_index and indexers to conform data structures to the new index.

log()

Get the natural logarithm of all elements, element-wise.

mask(cond[, other, inplace])

Replace values where the condition is True.

max()

Return the maximum value of the Index.

memory_usage(**kwargs)

Memory usage of the values.

min()

Return the minimum value of the Index.

notna()

Identify non-missing values.

notnull()

Identify non-missing values.

pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

rank([axis, method, numeric_only, …])

Compute numerical data ranks (1 through n) along axis.

rename(name[, inplace])

Alter Index name.

repeat(repeats[, axis])

Repeats elements consecutively.

round([decimals])

Round a DataFrame to a variable number of decimal places.

sample([n, frac, replace, weights, …])

Return a random sample of items from an axis of object.

scatter_by_map(map_index[, map_size, keep_index])

Scatter to a list of dataframes.

searchsorted(values[, side, ascending, …])

Find indices where elements should be inserted to maintain order

set_names(names[, level, inplace])

Set Index or MultiIndex name.

shift([periods, freq, axis, fill_value])

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

sort_values([return_indexer, ascending, key])

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

sqrt()

Get the non-negative square-root of all elements, element-wise.

sum()

Return the sum of all values of the Index.

take(indices)

Gather only the specific subset of indices

tan()

Get Trigonometric tangent, element-wise.

tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

to_array([fillna])

Get a dense numpy array for the data.

to_arrow()

Convert Index to PyArrow Array

to_dlpack()

Converts a cuDF object into a DLPack tensor.

to_frame([index, name])

Create a DataFrame with a column containing this Index

to_gpu_array([fillna])

Get a dense numba device array for the data.

to_pandas()

Convert to a Pandas Index.

to_series([index, name])

Create a Series with both index and values equal to the index keys.

unique()

Return unique values in the index.

where(cond[, other])

Replace values where the condition is False.

replace

acos()

Get Trigonometric inverse cosine, element-wise.

The inverse of cos so that, if y = x.cos(), then x = y.acos()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.acos()
0    3.141593
1    1.570796
2    0.000000
3    1.240482
4    1.047198
dtype: float64

acos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.acos()
      first    second
0  3.141593  1.334606
1  1.570796  1.266104
2  1.047198  1.470629

acos operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.acos()
Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0,
            1.5707963267948966,  1.266103672779499],
            dtype='float64')
any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

Parameters
otherIndex or list/tuple of indices
Returns
appendedIndex

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 10, 100])
>>> idx
Int64Index([1, 2, 10, 100], dtype='int64')
>>> other = cudf.Index([200, 400, 50])
>>> other
Int64Index([200, 400, 50], dtype='int64')
>>> idx.append(other)
Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')

append accepts list of Index objects

>>> idx.append([other, other])
Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
argsort(ascending=True, **kwargs)

Return the integer indices that would sort the index.

Parameters
ascendingbool, default True

If True, returns the indices for ascending order. If False, returns the indices for descending order.

Returns
arrayA cupy array containing Integer indices that

would sort the index if used as an indexer.

Examples

>>> import cudf
>>> index = cudf.Index([10, 100, 1, 1000])
>>> index
Int64Index([10, 100, 1, 1000], dtype='int64')
>>> index.argsort()
array([2, 0, 1, 3], dtype=int32)

The order of argsort can be reversed using ascending parameter, by setting it to False. >>> index.argsort(ascending=False) array([3, 1, 0, 2], dtype=int32)

argsort on a MultiIndex:

>>> index = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> index
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> index.argsort()
array([4, 0, 1, 2, 3], dtype=int32)
>>> index.argsort(ascending=False)
array([3, 2, 1, 0, 4], dtype=int32)
asin()

Get Trigonometric inverse sine, element-wise.

The inverse of sine so that, if y = x.sin(), then x = y.asin()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.asin()
0   -1.570796
1    0.000000
2    1.570796
3    0.330314
4    0.523599
dtype: float64

asin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.asin()
      first    second
0 -1.570796  0.236190
1  0.000000  0.304693
2  0.523599  0.100167

asin operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64')
>>> index.asin()
Float64Index([-1.5707963267948966, 0.41151684606748806,
            1.5707963267948966, 0.3046926540153975],
            dtype='float64')
astype(dtype, copy=False)

Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.

Parameters
dtypenumpy dtype

Use a numpy.dtype to cast entire Index object to.

copybool, default False

By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.

Returns
Index

Index with values cast to specified dtype.

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3])
>>> index
Int64Index([1, 2, 3], dtype='int64')
>>> index.astype('float64')
Float64Index([1.0, 2.0, 3.0], dtype='float64')
atan()

Get Trigonometric inverse tangent, element-wise.

The inverse of tan so that, if y = x.tan(), then x = y.atan()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10])
>>> ser
0    -1.00000
1     0.00000
2     1.00000
3     0.32434
4     0.50000
5   -10.00000
dtype: float64
>>> ser.atan()
0   -0.785398
1    0.000000
2    0.785398
3    0.313635
4    0.463648
5   -1.471128
dtype: float64

atan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.atan()
      first    second
0 -0.785398  0.229864
1 -1.471128  0.291457
2  0.463648  1.471128

atan operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.atan()
Float64Index([-0.7853981633974483,  0.3805063771123649,
                            0.7853981633974483, 0.0,
                            0.2914567944778671],
            dtype='float64')
clip(lower=None, upper=None, inplace=False, axis=1)

Trim values at input threshold(s).

Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.

Parameters
lowerscalar or array_like, default None

Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.

upperscalar or array_like, default None

Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.

inplacebool, default False
Returns
Clipped DataFrame/Series/Index/MultiIndex

Examples

>>> import cudf
>>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']})
>>> df.clip(lower=[2, 'b'], upper=[3, 'c'])
   a  b
0  2  b
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=None, upper=[3, 'c'])
   a  b
0  1  a
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=[2, 'b'], upper=None)
   a  b
0  2  b
1  2  b
2  3  c
3  4  d
>>> df.clip(lower=2, upper=3, inplace=True)
>>> df
   a  b
0  2  2
1  2  3
2  3  3
3  3  3
>>> import cudf
>>> sr = cudf.Series([1, 2, 3, 4])
>>> sr.clip(lower=2, upper=3)
0    2
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=None, upper=3)
0    1
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True)
>>> sr
0    2
1    2
2    3
3    4
dtype: int64
copy(name=None, deep=False, dtype=None, names=None)

Make a copy of this object.

Parameters
nameobject optional (default: None), name of index
deepBool (default: False)

Ignored for RangeIndex

dtypenumpy dtype optional (default: None)

Target dtype for underlying range data

nameslist-like optional (default: False)

Kept compatibility with MultiIndex. Should not be used.

Returns
New RangeIndex instance with same range, casted to new dtype
cos()

Get Trigonometric cosine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.cos()
0    1.000000
1    0.947861
2    0.877583
3    0.525322
4   -0.448074
5   -0.598460
6   -0.283691
dtype: float64

cos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.cos()
      first    second
0  1.000000  0.862319
1  0.283662 -0.283691
2 -0.839072 -0.839039
3 -0.759688 -0.022097

cos operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.cos()
Float64Index([ 0.9210609940028851,  0.8623188722876839,
            -0.5984600690578581, -0.4480736161291701],
            dtype='float64')
difference(other, sort=None)

Return a new Index with elements from the index that are not in other.

This is the set difference of two Index objects.

Parameters
otherIndex or array-like
sortFalse or None, default None

Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.

  • None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.

  • False : Do not sort the result.

Returns
differenceIndex

Examples

>>> import cudf
>>> idx1 = cudf.Index([2, 1, 3, 4])
>>> idx1
Int64Index([2, 1, 3, 4], dtype='int64')
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx2
Int64Index([3, 4, 5, 6], dtype='int64')
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
drop_duplicates(keep='first')

Return Index with duplicate values removed

Parameters
keep{‘first’, ‘last’, False}, default ‘first’
  • ‘first’Drop duplicates except for the

    first occurrence.

  • ‘last’Drop duplicates except for the

    last occurrence.

  • False : Drop all duplicates.

Returns
deduplicatedIndex

Examples

>>> import cudf
>>> idx = cudf.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
>>> idx
StringIndex(['lama' 'cow' 'lama' 'beetle' 'lama' 'hippo'], dtype='object')
>>> idx.drop_duplicates()
StringIndex(['beetle' 'cow' 'hippo' 'lama'], dtype='object')
dropna(how='any')

Return an Index with null values removed.

Parameters
how{‘any’, ‘all’}, default ‘any’

If the Index is a MultiIndex, drop the value when any or all levels are NaN.

Returns
validIndex

Examples

>>> import cudf
>>> index = cudf.Index(['a', None, 'b', 'c'])
>>> index
StringIndex(['a' None 'b' 'c'], dtype='object')
>>> index.dropna()
StringIndex(['a' 'b' 'c'], dtype='object')

Using dropna on a MultiIndex:

>>> midx = cudf.MultiIndex(
...         levels=[[1, None, 4, None], [1, 2, 5]],
...         codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...         names=["x", "y"],
...     )
>>> midx
MultiIndex([(   1, 1),
            (   1, 5),
            (<NA>, 2),
            (   4, 2),
            (<NA>, 1)],
           names=['x', 'y'])
>>> midx.dropna()
MultiIndex([(1, 1),
            (1, 5),
            (4, 2)],
           names=['x', 'y'])
property dtype

dtype of the range of values in RangeIndex.

property empty

Indicator whether Index is empty.

True if Index is entirely empty (no items).

Returns
outbool

If Index is empty, return True, if not return False.

Examples

>>> import cudf
>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.empty
True
equals(other)

Determine if two Index objects contain the same elements.

Returns
out: bool

True if “other” is an Index and it has the same elements as calling index; False otherwise.

exp()

Get the exponential of all elements, element-wise.

Exponential is the inverse of the log function, so that x.exp().log() = x

Returns
DataFrame/Series/Index

Result of the element-wise exponential.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.exp()
0    3.678794e-01
1    1.000000e+00
2    2.718282e+00
3    1.383117e+00
4    1.648721e+00
5    4.539993e-05
6    2.688117e+43
dtype: float64

exp operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.exp()
      first        second
0  0.367879      1.263644
1  0.000045      1.349859
2  1.648721  22026.465795

exp operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.exp()
Float64Index([0.36787944117144233,  1.4918246976412703,
              2.718281828459045, 1.0,  1.3498588075760032],
            dtype='float64')
factorize(na_sentinel=- 1)

Encode the input values as integer labels

See also

cudf.core.series.Series.factorize

Encode the input values of Series.

fillna(value, downcast=None)

Fill null values with the specified value.

Parameters
valuescalar

Scalar value to use to fill nulls. This value cannot be a list-likes.

downcastdict, default is None

This Parameter is currently NON-FUNCTIONAL.

Returns
filledIndex

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, None, 4])
>>> index
Int64Index([1, 2, null, 4], dtype='int64')
>>> index.fillna(3)
Int64Index([1, 2, 3, 4], dtype='int64')
find_label_range(first=None, last=None)

Find subrange in the RangeIndex, marked by their positions, that starts greater or equal to first and ends less or equal to last

The range returned is assumed to be monotonically increasing. In cases where there is no such range that suffice the constraint, an exception will be raised.

Parameters
first, lastint, optional, Default None

The “start” and “stop” values of the subrange. If None, will use self._start as first, self._stop as last.

Returns
begin, end2-tuple of int

The starting index and the ending index. The last value occurs at end - 1 position.

classmethod from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

Parameters
arrayPyArrow Array/ChunkedArray

PyArrow Object which has to be converted to Index

Returns
cudf Index
Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pyarrow as pa
>>> cudf.Index.from_arrow(pa.array(["a", "b", None]))
StringIndex(['a' 'b' None], dtype='object')
classmethod from_pandas(index, nan_as_null=None)

Convert from a Pandas Index.

Parameters
indexPandas Index object

A Pandas Index object which has to be converted to cuDF Index.

nan_as_nullbool, Default None

If None/True, converts np.nan values to null values. If False, leaves np.nan values as is.

Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pandas as pd
>>> import numpy as np
>>> data = [10, 20, 30, np.nan]
>>> pdi = pd.Index(data)
>>> cudf.core.index.Index.from_pandas(pdi)
Float64Index([10.0, 20.0, 30.0, <NA>], dtype='float64')
>>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False)
Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
get_level_values(level)

Return an Index of values for requested level.

This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.

Parameters
levelint or str

It is either the integer position or the name of the level.

Returns
Index

Calling object, as there is only one level in the Index.

See also

cudf.core.multiindex.MultiIndex.get_level_values

Get values for a level of a MultiIndex.

Notes

For Index, level should be 0, since there are no multiple levels.

Examples

>>> import cudf
>>> idx = cudf.Index(["a", "b", "c"])
>>> idx.get_level_values(0)
StringIndex(['a' 'b' 'c'], dtype='object')
get_slice_bound(label, side, kind=None)

Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if side=='right') position of given label.

Parameters
labelint

A valid value in the RangeIndex

side{‘left’, ‘right’}
kindUnused

To keep consistency with other index types.

Returns
int

Index of label.

property gpu_values

View the data as a numba device array object

interleave_columns()

Interleave Series columns of a table into a single column.

Converts the column major table cols into a row major column.

Parameters
colsinput Table containing columns to interleave.
Returns
The interleaved columns as a single column

Examples

>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']])
>>> df
0    [A1, A2, A3]
1    [B1, B2, B3]
>>> df.interleave_columns()
0    A1
1    B1
2    A2
3    B2
4    A3
5    B3
property is_contiguous

Returns if the index is contiguous.

property is_monotonic

Alias for is_monotonic_increasing.

property is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

property is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

property is_unique

Return if the index has unique values.

isin(values)

Return a boolean array where the index values are in values.

Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.

Parameters
valuesset, list-like, Index

Sought values.

Returns
is_containedcupy array

CuPy array of boolean values.

Examples

>>> idx = cudf.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')

Check whether each index value in a list of values.

>>> idx.isin([1, 4])
array([ True, False, False])
isna()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
isnull()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
join(other, how='left', level=None, return_indexers=False, sort=False)

Compute join_index and indexers to conform data structures to the new index.

Parameters
otherIndex.
how{‘left’, ‘right’, ‘inner’, ‘outer’}
return_indexersbool, default False
sortbool, default False

Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).

Returns: index

Examples

>>> import cudf
>>> lhs = cudf.DataFrame(
...     {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b']
... ).index
>>> lhs
MultiIndex([(2, 3),
            (3, 4),
            (1, 2)],
           names=['a', 'b'])
>>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index
>>> rhs
Int64Index([1, 4, 3], dtype='int64', name='a')
>>> lhs.join(rhs, how='inner')
MultiIndex([(3, 4),
            (1, 2)],
           names=['a', 'b'])
log()

Get the natural logarithm of all elements, element-wise.

Natural logarithm is the inverse of the exp function, so that x.log().exp() = x

Returns
DataFrame/Series/Index

Result of the element-wise natural logarithm.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.log()
0         NaN
1        -inf
2    0.000000
3   -1.125963
4   -0.693147
5         NaN
6    4.605170
dtype: float64

log operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.log()
      first    second
0       NaN -1.452434
1       NaN -1.203973
2 -0.693147  2.302585

log operation on Index:

>>> index = cudf.Index([10, 11, 500.0])
>>> index
Float64Index([10.0, 11.0, 500.0], dtype='float64')
>>> index.log()
Float64Index([2.302585092994046, 2.3978952727983707,
            6.214608098422191], dtype='float64')
mask(cond, other=None, inplace=False)

Replace values where the condition is True.

Parameters
condbool Series/DataFrame, array-like

Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.

other: scalar, list of scalars, Series/DataFrame

Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.

DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.

Series expects only scalar or series like with same length

inplacebool, default False

Whether to perform the operation in place on the data.

Returns
Same type as caller

Examples

>>> import cudf
>>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]})
>>> df.mask(df % 2 == 0, [-1, -1])
   A  B
0  1  3
1 -1  5
2  5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.mask(ser > 2, 10)
0    10
1    10
2     2
3     1
4     0
dtype: int64
>>> ser.mask(ser > 2)
0    <NA>
1    <NA>
2       2
3       1
4       0
dtype: int64
max()

Return the maximum value of the Index.

Returns
scalar

Maximum value.

See also

cudf.core.index.Index.min

Return the minimum value in an Index.

cudf.core.series.Series.max

Return the maximum value in a Series.

cudf.core.dataframe.DataFrame.max

Return the maximum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.max()
3
memory_usage(**kwargs)

Memory usage of the values.

Parameters
deepbool

Introspect the data deeply, interrogate object dtypes for system-level memory consumption.

Returns
bytes used
min()

Return the minimum value of the Index.

Returns
scalar

Minimum value.

See also

cudf.core.index.Index.max

Return the maximum value in an Index.

cudf.core.series.Series.min

Return the minimum value in a Series.

cudf.core.dataframe.DataFrame.min

Return the minimum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.min()
1
property name

Returns the name of the Index.

property names

Returns a tuple containing the name of the Index.

property ndim

Dimension of the data. Apart from MultiIndex ndim is always 1.

property nlevels

Number of levels.

notna()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
notnull()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

Parameters
funcfunction

Function to apply to the Series/DataFrame/Index. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame/Index.

argsiterable, optional

Positional arguments passed into func.

kwargsmapping, optional

A dictionary of keyword arguments passed into func.

Returns
objectthe return type of func.

Examples

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. Instead of writing

>>> func(g(h(df), arg1=a), arg2=b, arg3=c)

You can write

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe(func, arg2=b, arg3=c)
... )

If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose f takes its data as arg2:

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe((func, 'arg2'), arg1=a, arg3=c)
...  )
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)

Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.

Parameters
axis{0 or ‘index’, 1 or ‘columns’}, default 0

Index to direct ranking.

method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’

How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.

numeric_onlybool, optional

For DataFrame objects, rank only numeric columns if set to True.

na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’

How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.

ascendingbool, default True

Whether or not the elements should be ranked in ascending order.

pctbool, default False

Whether or not to display the returned rankings in percentile form.

Returns
same type as caller

Return a Series or DataFrame with data ranks as values.

rename(name, inplace=False)

Alter Index name.

Defaults to returning new index.

Parameters
namelabel

Name(s) to set.

Returns
Index

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3], name='one')
>>> index
Int64Index([1, 2, 3], dtype='int64', name='one')
>>> index.name
'one'
>>> renamed_index = index.rename('two')
>>> renamed_index
Int64Index([1, 2, 3], dtype='int64', name='two')
>>> renamed_index.name
'two'
repeat(repeats, axis=None)

Repeats elements consecutively.

Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.

Parameters
repeatsint, or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.

Returns
Series/DataFrame/Index

A newly created object of same type as caller with repeated elements.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]})
>>> df
   a   b
0  1  10
1  2  20
2  3  30
>>> df.repeat(3)
   a   b
0  1  10
0  1  10
0  1  10
1  2  20
1  2  20
1  2  20
2  3  30
2  3  30
2  3  30

Repeat on Series

>>> s = cudf.Series([0, 2])
>>> s
0    0
1    2
dtype: int64
>>> s.repeat([3, 4])
0    0
0    0
0    0
1    2
1    2
1    2
1    2
dtype: int64
>>> s.repeat(2)
0    0
0    0
1    2
1    2
dtype: int64

Repeat on Index

>>> index = cudf.Index([10, 22, 33, 55])
>>> index
Int64Index([10, 22, 33, 55], dtype='int64')
>>> index.repeat(5)
Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33,
            33, 33, 33, 33, 55, 55, 55, 55, 55],
        dtype='int64')
round(decimals=0)

Round a DataFrame to a variable number of decimal places.

Parameters
decimalsint, dict, Series

Number of decimal places to round each column to. If an int is given, round each column to the same number of places. Otherwise dict and Series round to variable numbers of places. Column names should be in the keys if decimals is a dict-like, or in the index if decimals is a Series. Any columns not included in decimals will be left as is. Elements of decimals which are not columns of the input will be ignored.

Returns
DataFrame

A DataFrame with the affected columns rounded to the specified number of decimal places.

Examples

>>> df = cudf.DataFrame(
        [(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
...     columns=['dogs', 'cats']
... )
>>> df
    dogs  cats
0  0.21  0.32
1  0.01  0.67
2  0.66  0.03
3  0.21  0.18

By providing an integer each column is rounded to the same number of decimal places

>>> df.round(1)
    dogs  cats
0   0.2   0.3
1   0.0   0.7
2   0.7   0.0
3   0.2   0.2

With a dict, the number of places for specific columns can be specified with the column names as key and the number of decimal places as value

>>> df.round({'dogs': 1, 'cats': 0})
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

Using a Series, the number of places for specific columns can be specified with the column names as index and the number of decimal places as value

>>> decimals = cudf.Series([0, 1], index=['cats', 'dogs'])
>>> df.round(decimals)
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters
nint, optional

Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

fracfloat, optional

Fraction of axis items to return. Cannot be used with n.

replacebool, default False

Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”

weightsstr or ndarray-like, optional

Only supported for axis=1/”columns”

random_stateint, numpy RandomState or None, default None

Seed for the random number generator (if int), or None. If None, a random seed will be chosen. if RandomState, seed will be extracted from current state.

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.

Returns
Series or DataFrame or Index

A new object of same type as caller containing n items randomly sampled from the caller object.

Examples

>>> import cudf as cudf
>>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}})
>>> df.sample(3)
   a
1  2
3  4
0  1
>>> sr = cudf.Series([1, 2, 3, 4, 5])
>>> sr.sample(10, replace=True)
1    4
3    1
2    4
0    5
0    1
4    5
4    1
0    2
0    3
3    2
dtype: int64
>>> df = cudf.DataFrame(
... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]})
>>> df.sample(2, axis=1)
   a  c
0  1  3
1  2  4
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)

Scatter to a list of dataframes.

Uses map_index to determine the destination of each row of the original DataFrame.

Parameters
map_indexSeries, str or list-like

Scatter assignment for each row

map_sizeint

Length of output list. Must be >= uniques in map_index

keep_indexbool

Conserve original index values for each row

Returns
A list of cudf.DataFrame objects.
searchsorted(values, side='left', ascending=True, na_position='last')

Find indices where elements should be inserted to maintain order

Parameters
valueFrame (Shape must be consistent with self)

Values to be hypothetically inserted into Self

sidestr {‘left’, ‘right’} optional, default ‘left‘

If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index

ascendingbool optional, default True

Sorted Frame is in ascending order (otherwise descending)

na_positionstr {‘last’, ‘first’} optional, default ‘last‘

Position of null values in sorted order

Returns
1-D cupy array of insertion points

Examples

>>> s = cudf.Series([1, 2, 3])
>>> s.searchsorted(4)
3
>>> s.searchsorted([0, 4])
array([0, 3], dtype=int32)
>>> s.searchsorted([1, 3], side='left')
array([0, 2], dtype=int32)
>>> s.searchsorted([1, 3], side='right')
array([1, 3], dtype=int32)

If the values are not monotonically sorted, wrong locations may be returned:

>>> s = cudf.Series([2, 1, 3])
>>> s.searchsorted(1)
0   # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]})
>>> df
   a   b
0  1  10
1  3  12
2  5  14
3  7  16
>>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6],
... 'b': [10, 11, 13, 15]})
>>> values_df
   a   b
0  0  10
1  2  17
2  5  13
3  6  15
>>> df.searchsorted(values_df, ascending=False)
array([4, 4, 4, 0], dtype=int32)
set_names(names, level=None, inplace=False)

Set Index or MultiIndex name. Able to set new names partially and by level.

Parameters
nameslabel or list of label

Name(s) to set.

levelint, label or list of int or label, optional

If the index is a MultiIndex, level(s) to set (None for all levels). Otherwise level must be None.

inplacebool, default False

Modifies the object directly, instead of creating a new Index or MultiIndex.

Returns
Index

The same type as the caller or None if inplace is True.

See also

cudf.core.index.Index.rename

Able to set new names without level.

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = cudf.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
            ('python', 2019),
            ( 'cobra', 2018),
            ( 'cobra', 2019)],
           )
>>> idx.names
FrozenList([None, None])
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx.names
FrozenList(['kind', 'year'])
>>> idx.set_names('species', level=0, inplace=True)
>>> idx.names
FrozenList(['species', 'year'])
property shape

Returns a tuple representing the dimensionality of the Index.

shift(periods=1, freq=None, axis=0, fill_value=None)

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.sin()
0    0.000000
1    0.318683
2    0.479426
3    0.850904
4    0.893997
5   -0.801153
6    0.958916
dtype: float64

sin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.sin()
      first    second
0  0.000000 -0.506366
1 -0.958924  0.958916
2 -0.544021 -0.544072
3  0.650288 -0.999756

sin operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.sin()
Float64Index([-0.3894183423086505, -0.5063656411097588,
            0.8011526357338306, 0.8939966636005579],
            dtype='float64')
property size

Return the number of elements in the underlying data.

Returns
sizeSize of the DataFrame / Index / Series / MultiIndex

Examples

Size of an empty dataframe is 0.

>>> import cudf
>>> df = cudf.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>> df.size
0
>>> df = cudf.DataFrame(index=[1, 2, 3])
>>> df
Empty DataFrame
Columns: []
Index: [1, 2, 3]
>>> df.size
0

DataFrame with values

>>> df = cudf.DataFrame({'a': [10, 11, 12],
...         'b': ['hello', 'rapids', 'ai']})
>>> df
    a       b
0  10   hello
1  11  rapids
2  12      ai
>>> df.size
6
>>> df.index
RangeIndex(start=0, stop=3)
>>> df.index.size
3

Size of an Index

>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.size
0
>>> index = cudf.Index([1, 2, 3, 10])
>>> index
Int64Index([1, 2, 3, 10], dtype='int64')
>>> index.size
4

Size of a MultiIndex

>>> midx = cudf.MultiIndex(
...                 levels=[["a", "b", "c", None], ["1", None, "5"]],
...                 codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...                 names=["x", "y"],
...             )
>>> midx
MultiIndex([( 'a',  '1'),
            ( 'a',  '5'),
            ( 'b', <NA>),
            ( 'c', <NA>),
            (<NA>,  '1')],
           names=['x', 'y'])
>>> midx.size
5
sort_values(return_indexer=False, ascending=True, key=None)

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

Parameters
return_indexerbool, default False

Should the indices that would sort the index be returned.

ascendingbool, default True

Should the index values be sorted in an ascending order.

keyNone, optional

This parameter is NON-FUNCTIONAL.

Returns
sorted_indexIndex

Sorted copy of the index.

indexercupy.ndarray, optional

The indices that the index itself was sorted by.

See also

cudf.core.series.Series.min

Sort values of a Series.

cudf.core.dataframe.DataFrame.sort_values

Sort values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')

Sort values in ascending order (default behavior).

>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')

Sort values in descending order, and also get the indices idx was sorted by.

>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2],
                                                    dtype=int32))

Sorting values in a MultiIndex:

>>> midx = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> midx
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> midx.sort_values()
MultiIndex([(-10,  1),
            (  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11)],
           names=['x', 'y'])
>>> midx.sort_values(ascending=False)
MultiIndex([(  4, 11),
            (  3, 11),
            (  1,  5),
            (  1,  1),
            (-10,  1)],
           names=['x', 'y'])
sqrt()

Get the non-negative square-root of all elements, element-wise.

Returns
DataFrame/Series/Index

Result of the non-negative square-root of each element.

Examples

>>> import cudf
>>> import cudf
>>> ser = cudf.Series([10, 25, 81, 1.0, 100])
>>> ser
0     10.0
1     25.0
2     81.0
3      1.0
4    100.0
dtype: float64
>>> ser.sqrt()
0     3.162278
1     5.000000
2     9.000000
3     1.000000
4    10.000000
dtype: float64

sqrt operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-10.0, 100, 625],
...                      'second': [1, 2, 0.4]})
>>> df
   first  second
0  -10.0     1.0
1  100.0     2.0
2  625.0     0.4
>>> df.sqrt()
   first    second
0    NaN  1.000000
1   10.0  1.414214
2   25.0  0.632456

sqrt operation on Index:

>>> index = cudf.Index([-10.0, 100, 625])
>>> index
Float64Index([-10.0, 100.0, 625.0], dtype='float64')
>>> index.sqrt()
Float64Index([nan, 10.0, 25.0], dtype='float64')
property start

The value of the start parameter (0 if this was not supplied).

property step

The value of the step parameter.

property stop

The value of the stop parameter.

sum()

Return the sum of all values of the Index.

Returns
scalar

Sum of all values.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.sum()
6
take(indices)

Gather only the specific subset of indices

Parameters
indices: An array-like that maps to values contained in this Index.
tan()

Get Trigonometric tangent, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.tan()
0    0.000000
1    0.336213
2    0.546302
3    1.619775
4   -1.995200
5    1.338690
6   -3.380140
dtype: float64

tan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.tan()
      first     second
0  0.000000  -0.587214
1 -3.380515  -3.380140
2  0.648361   0.648446
3 -0.855993  45.244742

tan operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.tan()
Float64Index([-0.4227932187381618,  -0.587213915156929,
            -1.3386902103511544, -1.995200412208242],
            dtype='float64')
tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

Parameters
selfinput Table containing columns to interleave.
countNumber of times to tile “rows”. Must be non-negative.
Returns
The table containing the tiled “rows”.

Examples

>>> df  = Dataframe([[8, 4, 7], [5, 2, 3]])
>>> count = 2
>>> df.tile(df, count)
   0  1  2
0  8  4  7
1  5  2  3
0  8  4  7
1  5  2  3
to_array(fillna=None)

Get a dense numpy array for the data.

Parameters
fillnastr or None

Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.

Notes

if fillna is None, null values are skipped. Therefore, the output size could be smaller.

to_arrow()

Convert Index to PyArrow Array

Returns
PyArrow Array

Examples

>>> import cudf
>>> ind = cudf.Index(["a", "b", None])
>>> ind.to_arrow()
<pyarrow.lib.StringArray object at 0x7f796b0e7750>
[
  "a",
  "b",
  null
]
to_dlpack()

Converts a cuDF object into a DLPack tensor.

DLPack is an open-source memory tensor structure: dmlc/dlpack.

This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.

Parameters
cudf_objDataFrame, Series, Index, or Column
Returns
pycapsule_objPyCapsule

Output DLPack tensor pointer which is encapsulated in a PyCapsule object.

to_frame(index=True, name=None)

Create a DataFrame with a column containing this Index

Parameters
indexboolean, default True

Set the index of the returned DataFrame as the original Index

namestr, default None

Name to be used for the column

Returns
DataFrame

cudf DataFrame

to_gpu_array(fillna=None)

Get a dense numba device array for the data.

Parameters
fillnastr or None

Replacement value to fill in place of nulls.

Notes

if fillna is None, null values are skipped. Therefore, the output size could be smaller.

to_pandas()

Convert to a Pandas Index.

Examples

>>> import cudf
>>> idx = cudf.Index([-3, 10, 15, 20])
>>> idx
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> idx.to_pandas()
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> type(idx.to_pandas())
<class 'pandas.core.indexes.numeric.Int64Index'>
>>> type(idx)
<class 'cudf.core.index.GenericIndex'>
to_series(index=None, name=None)

Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.

Parameters
indexIndex, optional

Index of resulting Series. If None, defaults to original index.

namestr, optional

Dame of resulting Series. If None, defaults to name of original index.

Returns
Series

The dtype will be based on the type of the Index values.

unique()

Return unique values in the index.

Returns
Index without duplicates
property values

Return an array representing the data in the Index.

Returns
arrayA cupy array of data in the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values
array([  1, -10, 100,  20])
>>> type(index.values)
<class 'cupy.core.core.ndarray'>
property values_host

Return a numpy representation of the Index.

Only the values in the Index will be returned.

Returns
outnumpy.ndarray

The values of the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values_host
array([  1, -10, 100,  20])
>>> type(index.values_host)
<class 'numpy.ndarray'>
where(cond, other=None)

Replace values where the condition is False.

Parameters
condbool array-like with the same length as self

Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.

other: scalar, or array-like

Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.

Returns
Same type as caller

Examples

>>> import cudf
>>> index = cudf.Index([4, 3, 2, 1, 0])
>>> index
Int64Index([4, 3, 2, 1, 0], dtype='int64')
>>> index.where(index > 2, 15)
Int64Index([4, 3, 15, 15, 15], dtype='int64')

GenericIndex

class cudf.core.index.GenericIndex(values, **kwargs)

Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.

Parameters
dataarray-like (1-dimensional)/ DataFrame

If it is a DataFrame, it will return a MultiIndex

dtypeNumPy dtype (default: object)

If dtype is None, we find the dtype that best fits the data.

copybool

Make a copy of input data.

nameobject

Name to be stored in the index.

tupleize_colsbool (default: True)

When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.

Returns
Index

cudf Index

Examples

>>> import cudf
>>> cudf.Index([1, 2, 3], dtype="uint64", name="a")
UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]}))
MultiIndex([(1, 2),
            (2, 3)],
          names=['a', 'b'])
Attributes
dtype

dtype of the underlying values in GenericIndex.

empty

Indicator whether Index is empty.

gpu_values

View the data as a numba device array object

is_monotonic

Alias for is_monotonic_increasing.

is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

is_unique

Return if the index has unique values.

name

Returns the name of the Index.

names

Returns a tuple containing the name of the Index.

ndim

Dimension of the data.

nlevels

Number of levels.

shape

Returns a tuple representing the dimensionality of the Index.

size

Return the number of elements in the underlying data.

values

Return an array representing the data in the Index.

values_host

Return a numpy representation of the Index.

Methods

acos()

Get Trigonometric inverse cosine, element-wise.

any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

argsort([ascending])

Return the integer indices that would sort the index.

asin()

Get Trigonometric inverse sine, element-wise.

astype(dtype[, copy])

Create an Index with values cast to dtypes.

atan()

Get Trigonometric inverse tangent, element-wise.

clip([lower, upper, inplace, axis])

Trim values at input threshold(s).

copy([name, deep, dtype, names])

Make a copy of this object.

cos()

Get Trigonometric cosine, element-wise.

difference(other[, sort])

Return a new Index with elements from the index that are not in other.

drop_duplicates([keep])

Return Index with duplicate values removed

dropna([how])

Return an Index with null values removed.

equals(other, **kwargs)

Determine if two Index objects contain the same elements.

exp()

Get the exponential of all elements, element-wise.

factorize([na_sentinel])

Encode the input values as integer labels

fillna(value[, downcast])

Fill null values with the specified value.

find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

from_pandas(index[, nan_as_null])

Convert from a Pandas Index.

get_level_values(level)

Return an Index of values for requested level.

get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label.

interleave_columns()

Interleave Series columns of a table into a single column.

isin(values)

Return a boolean array where the index values are in values.

isna()

Identify missing values.

isnull()

Identify missing values.

join(other[, how, level, return_indexers, sort])

Compute join_index and indexers to conform data structures to the new index.

log()

Get the natural logarithm of all elements, element-wise.

mask(cond[, other, inplace])

Replace values where the condition is True.

max()

Return the maximum value of the Index.

memory_usage([deep])

Memory usage of the values.

min()

Return the minimum value of the Index.

notna()

Identify non-missing values.

notnull()

Identify non-missing values.

pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

rank([axis, method, numeric_only, …])

Compute numerical data ranks (1 through n) along axis.

rename(name[, inplace])

Alter Index name.

repeat(repeats[, axis])

Repeats elements consecutively.

round([decimals])

Round a DataFrame to a variable number of decimal places.

sample([n, frac, replace, weights, …])

Return a random sample of items from an axis of object.

scatter_by_map(map_index[, map_size, keep_index])

Scatter to a list of dataframes.

searchsorted(values[, side, ascending, …])

Find indices where elements should be inserted to maintain order

set_names(names[, level, inplace])

Set Index or MultiIndex name.

shift([periods, freq, axis, fill_value])

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

sort_values([return_indexer, ascending, key])

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

sqrt()

Get the non-negative square-root of all elements, element-wise.

sum()

Return the sum of all values of the Index.

take(indices)

Gather only the specific subset of indices

tan()

Get Trigonometric tangent, element-wise.

tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

to_array([fillna])

Get a dense numpy array for the data.

to_arrow()

Convert Index to PyArrow Array

to_dlpack()

Converts a cuDF object into a DLPack tensor.

to_frame([index, name])

Create a DataFrame with a column containing this Index

to_pandas()

Convert to a Pandas Index.

to_series([index, name])

Create a Series with both index and values equal to the index keys.

unique()

Return unique values in the index.

where(cond[, other])

Replace values where the condition is False.

replace

acos()

Get Trigonometric inverse cosine, element-wise.

The inverse of cos so that, if y = x.cos(), then x = y.acos()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.acos()
0    3.141593
1    1.570796
2    0.000000
3    1.240482
4    1.047198
dtype: float64

acos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.acos()
      first    second
0  3.141593  1.334606
1  1.570796  1.266104
2  1.047198  1.470629

acos operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.acos()
Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0,
            1.5707963267948966,  1.266103672779499],
            dtype='float64')
any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

Parameters
otherIndex or list/tuple of indices
Returns
appendedIndex

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 10, 100])
>>> idx
Int64Index([1, 2, 10, 100], dtype='int64')
>>> other = cudf.Index([200, 400, 50])
>>> other
Int64Index([200, 400, 50], dtype='int64')
>>> idx.append(other)
Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')

append accepts list of Index objects

>>> idx.append([other, other])
Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
argsort(ascending=True, **kwargs)

Return the integer indices that would sort the index.

Parameters
ascendingbool, default True

If True, returns the indices for ascending order. If False, returns the indices for descending order.

Returns
arrayA cupy array containing Integer indices that

would sort the index if used as an indexer.

Examples

>>> import cudf
>>> index = cudf.Index([10, 100, 1, 1000])
>>> index
Int64Index([10, 100, 1, 1000], dtype='int64')
>>> index.argsort()
array([2, 0, 1, 3], dtype=int32)

The order of argsort can be reversed using ascending parameter, by setting it to False. >>> index.argsort(ascending=False) array([3, 1, 0, 2], dtype=int32)

argsort on a MultiIndex:

>>> index = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> index
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> index.argsort()
array([4, 0, 1, 2, 3], dtype=int32)
>>> index.argsort(ascending=False)
array([3, 2, 1, 0, 4], dtype=int32)
asin()

Get Trigonometric inverse sine, element-wise.

The inverse of sine so that, if y = x.sin(), then x = y.asin()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.asin()
0   -1.570796
1    0.000000
2    1.570796
3    0.330314
4    0.523599
dtype: float64

asin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.asin()
      first    second
0 -1.570796  0.236190
1  0.000000  0.304693
2  0.523599  0.100167

asin operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64')
>>> index.asin()
Float64Index([-1.5707963267948966, 0.41151684606748806,
            1.5707963267948966, 0.3046926540153975],
            dtype='float64')
astype(dtype, copy=False)

Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.

Parameters
dtypenumpy dtype

Use a numpy.dtype to cast entire Index object to.

copybool, default False

By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.

Returns
Index

Index with values cast to specified dtype.

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3])
>>> index
Int64Index([1, 2, 3], dtype='int64')
>>> index.astype('float64')
Float64Index([1.0, 2.0, 3.0], dtype='float64')
atan()

Get Trigonometric inverse tangent, element-wise.

The inverse of tan so that, if y = x.tan(), then x = y.atan()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10])
>>> ser
0    -1.00000
1     0.00000
2     1.00000
3     0.32434
4     0.50000
5   -10.00000
dtype: float64
>>> ser.atan()
0   -0.785398
1    0.000000
2    0.785398
3    0.313635
4    0.463648
5   -1.471128
dtype: float64

atan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.atan()
      first    second
0 -0.785398  0.229864
1 -1.471128  0.291457
2  0.463648  1.471128

atan operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.atan()
Float64Index([-0.7853981633974483,  0.3805063771123649,
                            0.7853981633974483, 0.0,
                            0.2914567944778671],
            dtype='float64')
clip(lower=None, upper=None, inplace=False, axis=1)

Trim values at input threshold(s).

Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.

Parameters
lowerscalar or array_like, default None

Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.

upperscalar or array_like, default None

Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.

inplacebool, default False
Returns
Clipped DataFrame/Series/Index/MultiIndex

Examples

>>> import cudf
>>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']})
>>> df.clip(lower=[2, 'b'], upper=[3, 'c'])
   a  b
0  2  b
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=None, upper=[3, 'c'])
   a  b
0  1  a
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=[2, 'b'], upper=None)
   a  b
0  2  b
1  2  b
2  3  c
3  4  d
>>> df.clip(lower=2, upper=3, inplace=True)
>>> df
   a  b
0  2  2
1  2  3
2  3  3
3  3  3
>>> import cudf
>>> sr = cudf.Series([1, 2, 3, 4])
>>> sr.clip(lower=2, upper=3)
0    2
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=None, upper=3)
0    1
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True)
>>> sr
0    2
1    2
2    3
3    4
dtype: int64
copy(name=None, deep=False, dtype=None, names=None)

Make a copy of this object.

Parameters
nameobject, default None

Name of index, use original name when None

deepbool, default True

Make a deep copy of the data. With deep=False the original data is used

dtypenumpy dtype, default None

Target datatype to cast into, use original dtype when None

nameslist-like, default False

Kept compatibility with MultiIndex. Should not be used.

Returns
New index instance, casted to new dtype
cos()

Get Trigonometric cosine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.cos()
0    1.000000
1    0.947861
2    0.877583
3    0.525322
4   -0.448074
5   -0.598460
6   -0.283691
dtype: float64

cos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.cos()
      first    second
0  1.000000  0.862319
1  0.283662 -0.283691
2 -0.839072 -0.839039
3 -0.759688 -0.022097

cos operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.cos()
Float64Index([ 0.9210609940028851,  0.8623188722876839,
            -0.5984600690578581, -0.4480736161291701],
            dtype='float64')
difference(other, sort=None)

Return a new Index with elements from the index that are not in other.

This is the set difference of two Index objects.

Parameters
otherIndex or array-like
sortFalse or None, default None

Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.

  • None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.

  • False : Do not sort the result.

Returns
differenceIndex

Examples

>>> import cudf
>>> idx1 = cudf.Index([2, 1, 3, 4])
>>> idx1
Int64Index([2, 1, 3, 4], dtype='int64')
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx2
Int64Index([3, 4, 5, 6], dtype='int64')
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
drop_duplicates(keep='first')

Return Index with duplicate values removed

Parameters
keep{‘first’, ‘last’, False}, default ‘first’
  • ‘first’Drop duplicates except for the

    first occurrence.

  • ‘last’Drop duplicates except for the

    last occurrence.

  • False : Drop all duplicates.

Returns
deduplicatedIndex

Examples

>>> import cudf
>>> idx = cudf.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
>>> idx
StringIndex(['lama' 'cow' 'lama' 'beetle' 'lama' 'hippo'], dtype='object')
>>> idx.drop_duplicates()
StringIndex(['beetle' 'cow' 'hippo' 'lama'], dtype='object')
dropna(how='any')

Return an Index with null values removed.

Parameters
how{‘any’, ‘all’}, default ‘any’

If the Index is a MultiIndex, drop the value when any or all levels are NaN.

Returns
validIndex

Examples

>>> import cudf
>>> index = cudf.Index(['a', None, 'b', 'c'])
>>> index
StringIndex(['a' None 'b' 'c'], dtype='object')
>>> index.dropna()
StringIndex(['a' 'b' 'c'], dtype='object')

Using dropna on a MultiIndex:

>>> midx = cudf.MultiIndex(
...         levels=[[1, None, 4, None], [1, 2, 5]],
...         codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...         names=["x", "y"],
...     )
>>> midx
MultiIndex([(   1, 1),
            (   1, 5),
            (<NA>, 2),
            (   4, 2),
            (<NA>, 1)],
           names=['x', 'y'])
>>> midx.dropna()
MultiIndex([(1, 1),
            (1, 5),
            (4, 2)],
           names=['x', 'y'])
property dtype

dtype of the underlying values in GenericIndex.

property empty

Indicator whether Index is empty.

True if Index is entirely empty (no items).

Returns
outbool

If Index is empty, return True, if not return False.

Examples

>>> import cudf
>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.empty
True
equals(other, **kwargs)

Determine if two Index objects contain the same elements.

Returns
out: bool

True if “other” is an Index and it has the same elements as calling index; False otherwise.

exp()

Get the exponential of all elements, element-wise.

Exponential is the inverse of the log function, so that x.exp().log() = x

Returns
DataFrame/Series/Index

Result of the element-wise exponential.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.exp()
0    3.678794e-01
1    1.000000e+00
2    2.718282e+00
3    1.383117e+00
4    1.648721e+00
5    4.539993e-05
6    2.688117e+43
dtype: float64

exp operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.exp()
      first        second
0  0.367879      1.263644
1  0.000045      1.349859
2  1.648721  22026.465795

exp operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.exp()
Float64Index([0.36787944117144233,  1.4918246976412703,
              2.718281828459045, 1.0,  1.3498588075760032],
            dtype='float64')
factorize(na_sentinel=- 1)

Encode the input values as integer labels

See also

cudf.core.series.Series.factorize

Encode the input values of Series.

fillna(value, downcast=None)

Fill null values with the specified value.

Parameters
valuescalar

Scalar value to use to fill nulls. This value cannot be a list-likes.

downcastdict, default is None

This Parameter is currently NON-FUNCTIONAL.

Returns
filledIndex

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, None, 4])
>>> index
Int64Index([1, 2, null, 4], dtype='int64')
>>> index.fillna(3)
Int64Index([1, 2, 3, 4], dtype='int64')
find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

Returns
begin, end2-tuple of int

The starting index and the ending index. The last value occurs at end - 1 position.

classmethod from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

Parameters
arrayPyArrow Array/ChunkedArray

PyArrow Object which has to be converted to Index

Returns
cudf Index
Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pyarrow as pa
>>> cudf.Index.from_arrow(pa.array(["a", "b", None]))
StringIndex(['a' 'b' None], dtype='object')
classmethod from_pandas(index, nan_as_null=None)

Convert from a Pandas Index.

Parameters
indexPandas Index object

A Pandas Index object which has to be converted to cuDF Index.

nan_as_nullbool, Default None

If None/True, converts np.nan values to null values. If False, leaves np.nan values as is.

Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pandas as pd
>>> import numpy as np
>>> data = [10, 20, 30, np.nan]
>>> pdi = pd.Index(data)
>>> cudf.core.index.Index.from_pandas(pdi)
Float64Index([10.0, 20.0, 30.0, <NA>], dtype='float64')
>>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False)
Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
get_level_values(level)

Return an Index of values for requested level.

This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.

Parameters
levelint or str

It is either the integer position or the name of the level.

Returns
Index

Calling object, as there is only one level in the Index.

See also

cudf.core.multiindex.MultiIndex.get_level_values

Get values for a level of a MultiIndex.

Notes

For Index, level should be 0, since there are no multiple levels.

Examples

>>> import cudf
>>> idx = cudf.Index(["a", "b", "c"])
>>> idx.get_level_values(0)
StringIndex(['a' 'b' 'c'], dtype='object')
get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if side=='right') position of given label.

Parameters
labelobject
side{‘left’, ‘right’}
kind{‘ix’, ‘loc’, ‘getitem’}
Returns
int

Index of label.

property gpu_values

View the data as a numba device array object

interleave_columns()

Interleave Series columns of a table into a single column.

Converts the column major table cols into a row major column.

Parameters
colsinput Table containing columns to interleave.
Returns
The interleaved columns as a single column

Examples

>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']])
>>> df
0    [A1, A2, A3]
1    [B1, B2, B3]
>>> df.interleave_columns()
0    A1
1    B1
2    A2
3    B2
4    A3
5    B3
property is_monotonic

Alias for is_monotonic_increasing.

property is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

property is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

property is_unique

Return if the index has unique values.

isin(values)

Return a boolean array where the index values are in values.

Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.

Parameters
valuesset, list-like, Index

Sought values.

Returns
is_containedcupy array

CuPy array of boolean values.

Examples

>>> idx = cudf.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')

Check whether each index value in a list of values.

>>> idx.isin([1, 4])
array([ True, False, False])
isna()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
isnull()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
join(other, how='left', level=None, return_indexers=False, sort=False)

Compute join_index and indexers to conform data structures to the new index.

Parameters
otherIndex.
how{‘left’, ‘right’, ‘inner’, ‘outer’}
return_indexersbool, default False
sortbool, default False

Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).

Returns: index

Examples

>>> import cudf
>>> lhs = cudf.DataFrame(
...     {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b']
... ).index
>>> lhs
MultiIndex([(2, 3),
            (3, 4),
            (1, 2)],
           names=['a', 'b'])
>>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index
>>> rhs
Int64Index([1, 4, 3], dtype='int64', name='a')
>>> lhs.join(rhs, how='inner')
MultiIndex([(3, 4),
            (1, 2)],
           names=['a', 'b'])
log()

Get the natural logarithm of all elements, element-wise.

Natural logarithm is the inverse of the exp function, so that x.log().exp() = x

Returns
DataFrame/Series/Index

Result of the element-wise natural logarithm.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.log()
0         NaN
1        -inf
2    0.000000
3   -1.125963
4   -0.693147
5         NaN
6    4.605170
dtype: float64

log operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.log()
      first    second
0       NaN -1.452434
1       NaN -1.203973
2 -0.693147  2.302585

log operation on Index:

>>> index = cudf.Index([10, 11, 500.0])
>>> index
Float64Index([10.0, 11.0, 500.0], dtype='float64')
>>> index.log()
Float64Index([2.302585092994046, 2.3978952727983707,
            6.214608098422191], dtype='float64')
mask(cond, other=None, inplace=False)

Replace values where the condition is True.

Parameters
condbool Series/DataFrame, array-like

Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.

other: scalar, list of scalars, Series/DataFrame

Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.

DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.

Series expects only scalar or series like with same length

inplacebool, default False

Whether to perform the operation in place on the data.

Returns
Same type as caller

Examples

>>> import cudf
>>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]})
>>> df.mask(df % 2 == 0, [-1, -1])
   A  B
0  1  3
1 -1  5
2  5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.mask(ser > 2, 10)
0    10
1    10
2     2
3     1
4     0
dtype: int64
>>> ser.mask(ser > 2)
0    <NA>
1    <NA>
2       2
3       1
4       0
dtype: int64
max()

Return the maximum value of the Index.

Returns
scalar

Maximum value.

See also

cudf.core.index.Index.min

Return the minimum value in an Index.

cudf.core.series.Series.max

Return the maximum value in a Series.

cudf.core.dataframe.DataFrame.max

Return the maximum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.max()
3
memory_usage(deep=False)

Memory usage of the values.

Parameters
deepbool

Introspect the data deeply, interrogate object dtypes for system-level memory consumption.

Returns
bytes used
min()

Return the minimum value of the Index.

Returns
scalar

Minimum value.

See also

cudf.core.index.Index.max

Return the maximum value in an Index.

cudf.core.series.Series.min

Return the minimum value in a Series.

cudf.core.dataframe.DataFrame.min

Return the minimum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.min()
1
property name

Returns the name of the Index.

property names

Returns a tuple containing the name of the Index.

property ndim

Dimension of the data. Apart from MultiIndex ndim is always 1.

property nlevels

Number of levels.

notna()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
notnull()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

Parameters
funcfunction

Function to apply to the Series/DataFrame/Index. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame/Index.

argsiterable, optional

Positional arguments passed into func.

kwargsmapping, optional

A dictionary of keyword arguments passed into func.

Returns
objectthe return type of func.

Examples

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. Instead of writing

>>> func(g(h(df), arg1=a), arg2=b, arg3=c)

You can write

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe(func, arg2=b, arg3=c)
... )

If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose f takes its data as arg2:

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe((func, 'arg2'), arg1=a, arg3=c)
...  )
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)

Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.

Parameters
axis{0 or ‘index’, 1 or ‘columns’}, default 0

Index to direct ranking.

method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’

How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.

numeric_onlybool, optional

For DataFrame objects, rank only numeric columns if set to True.

na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’

How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.

ascendingbool, default True

Whether or not the elements should be ranked in ascending order.

pctbool, default False

Whether or not to display the returned rankings in percentile form.

Returns
same type as caller

Return a Series or DataFrame with data ranks as values.

rename(name, inplace=False)

Alter Index name.

Defaults to returning new index.

Parameters
namelabel

Name(s) to set.

Returns
Index

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3], name='one')
>>> index
Int64Index([1, 2, 3], dtype='int64', name='one')
>>> index.name
'one'
>>> renamed_index = index.rename('two')
>>> renamed_index
Int64Index([1, 2, 3], dtype='int64', name='two')
>>> renamed_index.name
'two'
repeat(repeats, axis=None)

Repeats elements consecutively.

Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.

Parameters
repeatsint, or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.

Returns
Series/DataFrame/Index

A newly created object of same type as caller with repeated elements.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]})
>>> df
   a   b
0  1  10
1  2  20
2  3  30
>>> df.repeat(3)
   a   b
0  1  10
0  1  10
0  1  10
1  2  20
1  2  20
1  2  20
2  3  30
2  3  30
2  3  30

Repeat on Series

>>> s = cudf.Series([0, 2])
>>> s
0    0
1    2
dtype: int64
>>> s.repeat([3, 4])
0    0
0    0
0    0
1    2
1    2
1    2
1    2
dtype: int64
>>> s.repeat(2)
0    0
0    0
1    2
1    2
dtype: int64

Repeat on Index

>>> index = cudf.Index([10, 22, 33, 55])
>>> index
Int64Index([10, 22, 33, 55], dtype='int64')
>>> index.repeat(5)
Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33,
            33, 33, 33, 33, 55, 55, 55, 55, 55],
        dtype='int64')
round(decimals=0)

Round a DataFrame to a variable number of decimal places.

Parameters
decimalsint, dict, Series

Number of decimal places to round each column to. If an int is given, round each column to the same number of places. Otherwise dict and Series round to variable numbers of places. Column names should be in the keys if decimals is a dict-like, or in the index if decimals is a Series. Any columns not included in decimals will be left as is. Elements of decimals which are not columns of the input will be ignored.

Returns
DataFrame

A DataFrame with the affected columns rounded to the specified number of decimal places.

Examples

>>> df = cudf.DataFrame(
        [(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
...     columns=['dogs', 'cats']
... )
>>> df
    dogs  cats
0  0.21  0.32
1  0.01  0.67
2  0.66  0.03
3  0.21  0.18

By providing an integer each column is rounded to the same number of decimal places

>>> df.round(1)
    dogs  cats
0   0.2   0.3
1   0.0   0.7
2   0.7   0.0
3   0.2   0.2

With a dict, the number of places for specific columns can be specified with the column names as key and the number of decimal places as value

>>> df.round({'dogs': 1, 'cats': 0})
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

Using a Series, the number of places for specific columns can be specified with the column names as index and the number of decimal places as value

>>> decimals = cudf.Series([0, 1], index=['cats', 'dogs'])
>>> df.round(decimals)
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters
nint, optional

Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

fracfloat, optional

Fraction of axis items to return. Cannot be used with n.

replacebool, default False

Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”

weightsstr or ndarray-like, optional

Only supported for axis=1/”columns”

random_stateint, numpy RandomState or None, default None

Seed for the random number generator (if int), or None. If None, a random seed will be chosen. if RandomState, seed will be extracted from current state.

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.

Returns
Series or DataFrame or Index

A new object of same type as caller containing n items randomly sampled from the caller object.

Examples

>>> import cudf as cudf
>>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}})
>>> df.sample(3)
   a
1  2
3  4
0  1
>>> sr = cudf.Series([1, 2, 3, 4, 5])
>>> sr.sample(10, replace=True)
1    4
3    1
2    4
0    5
0    1
4    5
4    1
0    2
0    3
3    2
dtype: int64
>>> df = cudf.DataFrame(
... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]})
>>> df.sample(2, axis=1)
   a  c
0  1  3
1  2  4
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)

Scatter to a list of dataframes.

Uses map_index to determine the destination of each row of the original DataFrame.

Parameters
map_indexSeries, str or list-like

Scatter assignment for each row

map_sizeint

Length of output list. Must be >= uniques in map_index

keep_indexbool

Conserve original index values for each row

Returns
A list of cudf.DataFrame objects.
searchsorted(values, side='left', ascending=True, na_position='last')

Find indices where elements should be inserted to maintain order

Parameters
valueFrame (Shape must be consistent with self)

Values to be hypothetically inserted into Self

sidestr {‘left’, ‘right’} optional, default ‘left‘

If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index

ascendingbool optional, default True

Sorted Frame is in ascending order (otherwise descending)

na_positionstr {‘last’, ‘first’} optional, default ‘last‘

Position of null values in sorted order

Returns
1-D cupy array of insertion points

Examples

>>> s = cudf.Series([1, 2, 3])
>>> s.searchsorted(4)
3
>>> s.searchsorted([0, 4])
array([0, 3], dtype=int32)
>>> s.searchsorted([1, 3], side='left')
array([0, 2], dtype=int32)
>>> s.searchsorted([1, 3], side='right')
array([1, 3], dtype=int32)

If the values are not monotonically sorted, wrong locations may be returned:

>>> s = cudf.Series([2, 1, 3])
>>> s.searchsorted(1)
0   # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]})
>>> df
   a   b
0  1  10
1  3  12
2  5  14
3  7  16
>>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6],
... 'b': [10, 11, 13, 15]})
>>> values_df
   a   b
0  0  10
1  2  17
2  5  13
3  6  15
>>> df.searchsorted(values_df, ascending=False)
array([4, 4, 4, 0], dtype=int32)
set_names(names, level=None, inplace=False)

Set Index or MultiIndex name. Able to set new names partially and by level.

Parameters
nameslabel or list of label

Name(s) to set.

levelint, label or list of int or label, optional

If the index is a MultiIndex, level(s) to set (None for all levels). Otherwise level must be None.

inplacebool, default False

Modifies the object directly, instead of creating a new Index or MultiIndex.

Returns
Index

The same type as the caller or None if inplace is True.

See also

cudf.core.index.Index.rename

Able to set new names without level.

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = cudf.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
            ('python', 2019),
            ( 'cobra', 2018),
            ( 'cobra', 2019)],
           )
>>> idx.names
FrozenList([None, None])
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx.names
FrozenList(['kind', 'year'])
>>> idx.set_names('species', level=0, inplace=True)
>>> idx.names
FrozenList(['species', 'year'])
property shape

Returns a tuple representing the dimensionality of the Index.

shift(periods=1, freq=None, axis=0, fill_value=None)

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.sin()
0    0.000000
1    0.318683
2    0.479426
3    0.850904
4    0.893997
5   -0.801153
6    0.958916
dtype: float64

sin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.sin()
      first    second
0  0.000000 -0.506366
1 -0.958924  0.958916
2 -0.544021 -0.544072
3  0.650288 -0.999756

sin operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.sin()
Float64Index([-0.3894183423086505, -0.5063656411097588,
            0.8011526357338306, 0.8939966636005579],
            dtype='float64')
property size

Return the number of elements in the underlying data.

Returns
sizeSize of the DataFrame / Index / Series / MultiIndex

Examples

Size of an empty dataframe is 0.

>>> import cudf
>>> df = cudf.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>> df.size
0
>>> df = cudf.DataFrame(index=[1, 2, 3])
>>> df
Empty DataFrame
Columns: []
Index: [1, 2, 3]
>>> df.size
0

DataFrame with values

>>> df = cudf.DataFrame({'a': [10, 11, 12],
...         'b': ['hello', 'rapids', 'ai']})
>>> df
    a       b
0  10   hello
1  11  rapids
2  12      ai
>>> df.size
6
>>> df.index
RangeIndex(start=0, stop=3)
>>> df.index.size
3

Size of an Index

>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.size
0
>>> index = cudf.Index([1, 2, 3, 10])
>>> index
Int64Index([1, 2, 3, 10], dtype='int64')
>>> index.size
4

Size of a MultiIndex

>>> midx = cudf.MultiIndex(
...                 levels=[["a", "b", "c", None], ["1", None, "5"]],
...                 codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...                 names=["x", "y"],
...             )
>>> midx
MultiIndex([( 'a',  '1'),
            ( 'a',  '5'),
            ( 'b', <NA>),
            ( 'c', <NA>),
            (<NA>,  '1')],
           names=['x', 'y'])
>>> midx.size
5
sort_values(return_indexer=False, ascending=True, key=None)

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

Parameters
return_indexerbool, default False

Should the indices that would sort the index be returned.

ascendingbool, default True

Should the index values be sorted in an ascending order.

keyNone, optional

This parameter is NON-FUNCTIONAL.

Returns
sorted_indexIndex

Sorted copy of the index.

indexercupy.ndarray, optional

The indices that the index itself was sorted by.

See also

cudf.core.series.Series.min

Sort values of a Series.

cudf.core.dataframe.DataFrame.sort_values

Sort values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')

Sort values in ascending order (default behavior).

>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')

Sort values in descending order, and also get the indices idx was sorted by.

>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2],
                                                    dtype=int32))

Sorting values in a MultiIndex:

>>> midx = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> midx
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> midx.sort_values()
MultiIndex([(-10,  1),
            (  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11)],
           names=['x', 'y'])
>>> midx.sort_values(ascending=False)
MultiIndex([(  4, 11),
            (  3, 11),
            (  1,  5),
            (  1,  1),
            (-10,  1)],
           names=['x', 'y'])
sqrt()

Get the non-negative square-root of all elements, element-wise.

Returns
DataFrame/Series/Index

Result of the non-negative square-root of each element.

Examples

>>> import cudf
>>> import cudf
>>> ser = cudf.Series([10, 25, 81, 1.0, 100])
>>> ser
0     10.0
1     25.0
2     81.0
3      1.0
4    100.0
dtype: float64
>>> ser.sqrt()
0     3.162278
1     5.000000
2     9.000000
3     1.000000
4    10.000000
dtype: float64

sqrt operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-10.0, 100, 625],
...                      'second': [1, 2, 0.4]})
>>> df
   first  second
0  -10.0     1.0
1  100.0     2.0
2  625.0     0.4
>>> df.sqrt()
   first    second
0    NaN  1.000000
1   10.0  1.414214
2   25.0  0.632456

sqrt operation on Index:

>>> index = cudf.Index([-10.0, 100, 625])
>>> index
Float64Index([-10.0, 100.0, 625.0], dtype='float64')
>>> index.sqrt()
Float64Index([nan, 10.0, 25.0], dtype='float64')
sum()

Return the sum of all values of the Index.

Returns
scalar

Sum of all values.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.sum()
6
take(indices)

Gather only the specific subset of indices

Parameters
indices: An array-like that maps to values contained in this Index.
tan()

Get Trigonometric tangent, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.tan()
0    0.000000
1    0.336213
2    0.546302
3    1.619775
4   -1.995200
5    1.338690
6   -3.380140
dtype: float64

tan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.tan()
      first     second
0  0.000000  -0.587214
1 -3.380515  -3.380140
2  0.648361   0.648446
3 -0.855993  45.244742

tan operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.tan()
Float64Index([-0.4227932187381618,  -0.587213915156929,
            -1.3386902103511544, -1.995200412208242],
            dtype='float64')
tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

Parameters
selfinput Table containing columns to interleave.
countNumber of times to tile “rows”. Must be non-negative.
Returns
The table containing the tiled “rows”.

Examples

>>> df  = Dataframe([[8, 4, 7], [5, 2, 3]])
>>> count = 2
>>> df.tile(df, count)
   0  1  2
0  8  4  7
1  5  2  3
0  8  4  7
1  5  2  3
to_array(fillna=None)

Get a dense numpy array for the data.

Parameters
fillnastr or None

Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.

Notes

if fillna is None, null values are skipped. Therefore, the output size could be smaller.

to_arrow()

Convert Index to PyArrow Array

Returns
PyArrow Array

Examples

>>> import cudf
>>> ind = cudf.Index(["a", "b", None])
>>> ind.to_arrow()
<pyarrow.lib.StringArray object at 0x7f796b0e7750>
[
  "a",
  "b",
  null
]
to_dlpack()

Converts a cuDF object into a DLPack tensor.

DLPack is an open-source memory tensor structure: dmlc/dlpack.

This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.

Parameters
cudf_objDataFrame, Series, Index, or Column
Returns
pycapsule_objPyCapsule

Output DLPack tensor pointer which is encapsulated in a PyCapsule object.

to_frame(index=True, name=None)

Create a DataFrame with a column containing this Index

Parameters
indexboolean, default True

Set the index of the returned DataFrame as the original Index

namestr, default None

Name to be used for the column

Returns
DataFrame

cudf DataFrame

to_pandas()

Convert to a Pandas Index.

Examples

>>> import cudf
>>> idx = cudf.Index([-3, 10, 15, 20])
>>> idx
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> idx.to_pandas()
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> type(idx.to_pandas())
<class 'pandas.core.indexes.numeric.Int64Index'>
>>> type(idx)
<class 'cudf.core.index.GenericIndex'>
to_series(index=None, name=None)

Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.

Parameters
indexIndex, optional

Index of resulting Series. If None, defaults to original index.

namestr, optional

Dame of resulting Series. If None, defaults to name of original index.

Returns
Series

The dtype will be based on the type of the Index values.

unique()

Return unique values in the index.

Returns
Index without duplicates
property values

Return an array representing the data in the Index.

Returns
arrayA cupy array of data in the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values
array([  1, -10, 100,  20])
>>> type(index.values)
<class 'cupy.core.core.ndarray'>
property values_host

Return a numpy representation of the Index.

Only the values in the Index will be returned.

Returns
outnumpy.ndarray

The values of the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values_host
array([  1, -10, 100,  20])
>>> type(index.values_host)
<class 'numpy.ndarray'>
where(cond, other=None)

Replace values where the condition is False.

Parameters
condbool array-like with the same length as self

Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.

other: scalar, or array-like

Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.

Returns
Same type as caller

Examples

>>> import cudf
>>> index = cudf.Index([4, 3, 2, 1, 0])
>>> index
Int64Index([4, 3, 2, 1, 0], dtype='int64')
>>> index.where(index > 2, 15)
Int64Index([4, 3, 15, 15, 15], dtype='int64')

MultiIndex

class cudf.core.multiindex.MultiIndex(levels=None, codes=None, sortorder=None, labels=None, names=None, dtype=None, copy=False, name=None, **kwargs)

Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.

Parameters
dataarray-like (1-dimensional)/ DataFrame

If it is a DataFrame, it will return a MultiIndex

dtypeNumPy dtype (default: object)

If dtype is None, we find the dtype that best fits the data.

copybool

Make a copy of input data.

nameobject

Name to be stored in the index.

tupleize_colsbool (default: True)

When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.

Returns
Index

cudf Index

Examples

>>> import cudf
>>> cudf.Index([1, 2, 3], dtype="uint64", name="a")
UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]}))
MultiIndex([(1, 2),
            (2, 3)],
          names=['a', 'b'])
Attributes
codes
empty

Indicator whether Index is empty.

gpu_values

View the data as a numba device array object

is_contiguous
is_monotonic

Alias for is_monotonic_increasing.

is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

is_unique

Return if the index has unique values.

labels
levels
name

Returns the name of the Index.

names

Returns a tuple containing the name of the Index.

ndim

Dimension of the data.

nlevels

Integer number of levels in this MultiIndex.

shape

Returns a tuple representing the dimensionality of the Index.

size

Return the number of elements in the underlying data.

values

Return a CuPy representation of the MultiIndex.

values_host

Return a numpy representation of the MultiIndex.

Methods

acos()

Get Trigonometric inverse cosine, element-wise.

any()

Return whether any elements is True in Index.

append(other)

Append a collection of MultiIndex objects together

argsort([ascending])

Return the integer indices that would sort the index.

asin()

Get Trigonometric inverse sine, element-wise.

astype(dtype[, copy])

Create an Index with values cast to dtypes.

atan()

Get Trigonometric inverse tangent, element-wise.

clip([lower, upper, inplace, axis])

Trim values at input threshold(s).

copy([names, dtype, levels, codes, deep, name])

Returns copy of MultiIndex object.

cos()

Get Trigonometric cosine, element-wise.

difference(other[, sort])

Return a new Index with elements from the index that are not in other.

drop_duplicates([keep])

Return Index with duplicate values removed

droplevel([level])

Removes the specified levels from the MultiIndex.

dropna([how])

Return an Index with null values removed.

equals(other, **kwargs)

Determine if two Index objects contain the same elements.

exp()

Get the exponential of all elements, element-wise.

factorize([na_sentinel])

Encode the input values as integer labels

fillna(value)

Fill null values with the specified value.

from_arrow(table)

Convert PyArrow Table to MultiIndex

from_pandas(multiindex[, nan_as_null])

Convert from a Pandas MultiIndex

get_level_values(level)

Return the values at the requested level

get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label.

interleave_columns()

Interleave Series columns of a table into a single column.

isin(values[, level])

Return a boolean array where the index values are in values.

isna()

Identify missing values.

isnull()

Identify missing values.

join(other[, how, level, return_indexers, sort])

Compute join_index and indexers to conform data structures to the new index.

log()

Get the natural logarithm of all elements, element-wise.

mask(cond[, other, inplace])

Replace values where the condition is True.

max()

Return the maximum value of the Index.

memory_usage([deep])

Memory usage of the values.

min()

Return the minimum value of the Index.

notna()

Identify non-missing values.

notnull()

Identify non-missing values.

pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

rank([axis, method, numeric_only, …])

Compute numerical data ranks (1 through n) along axis.

rename(names[, inplace])

Alter MultiIndex level names

repeat(repeats[, axis])

Repeats elements consecutively.

round([decimals])

Round a DataFrame to a variable number of decimal places.

sample([n, frac, replace, weights, …])

Return a random sample of items from an axis of object.

scatter_by_map(map_index[, map_size, keep_index])

Scatter to a list of dataframes.

searchsorted(values[, side, ascending, …])

Find indices where elements should be inserted to maintain order

set_names(names[, level, inplace])

Set Index or MultiIndex name.

shift([periods, freq, axis, fill_value])

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

sort_values([return_indexer, ascending, key])

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

sqrt()

Get the non-negative square-root of all elements, element-wise.

sum()

Return the sum of all values of the Index.

take(indices)

Gather only the specific subset of indices

tan()

Get Trigonometric tangent, element-wise.

tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

to_array([fillna])

Get a dense numpy array for the data.

to_arrow()

Convert MultiIndex to PyArrow Table

to_dlpack()

Converts a cuDF object into a DLPack tensor.

to_pandas([nullable])

Convert to a Pandas Index.

to_series([index, name])

Create a Series with both index and values equal to the index keys.

unique()

Return unique values in the index.

where(cond[, other, inplace])

Replace values where the condition is False.

array_equal

deepcopy

from_frame

from_product

from_tuples

nan_to_num

replace

to_frame

acos()

Get Trigonometric inverse cosine, element-wise.

The inverse of cos so that, if y = x.cos(), then x = y.acos()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.acos()
0    3.141593
1    1.570796
2    0.000000
3    1.240482
4    1.047198
dtype: float64

acos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.acos()
      first    second
0  3.141593  1.334606
1  1.570796  1.266104
2  1.047198  1.470629

acos operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.acos()
Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0,
            1.5707963267948966,  1.266103672779499],
            dtype='float64')
any()

Return whether any elements is True in Index.

append(other)

Append a collection of MultiIndex objects together

Parameters
otherMultiIndex or list/tuple of MultiIndex objects
Returns
appendedIndex

Examples

>>> import cudf
>>> idx1 = cudf.MultiIndex(
...     levels=[[1, 2], ['blue', 'red']],
...     codes=[[0, 0, 1, 1], [1, 0, 1, 0]]
... )
>>> idx2 = cudf.MultiIndex(
...     levels=[[3, 4], ['blue', 'red']],
...     codes=[[0, 0, 1, 1], [1, 0, 1, 0]]
... )
>>> idx1
MultiIndex([(1,  'red'),
            (1, 'blue'),
            (2,  'red'),
            (2, 'blue')],
           )
>>> idx2
MultiIndex([(3,  'red'),
            (3, 'blue'),
            (4,  'red'),
            (4, 'blue')],
           )
>>> idx1.append(idx2)
MultiIndex([(1,  'red'),
            (1, 'blue'),
            (2,  'red'),
            (2, 'blue'),
            (3,  'red'),
            (3, 'blue'),
            (4,  'red'),
            (4, 'blue')],
           )
argsort(ascending=True, **kwargs)

Return the integer indices that would sort the index.

Parameters
ascendingbool, default True

If True, returns the indices for ascending order. If False, returns the indices for descending order.

Returns
arrayA cupy array containing Integer indices that

would sort the index if used as an indexer.

Examples

>>> import cudf
>>> index = cudf.Index([10, 100, 1, 1000])
>>> index
Int64Index([10, 100, 1, 1000], dtype='int64')
>>> index.argsort()
array([2, 0, 1, 3], dtype=int32)

The order of argsort can be reversed using ascending parameter, by setting it to False. >>> index.argsort(ascending=False) array([3, 1, 0, 2], dtype=int32)

argsort on a MultiIndex:

>>> index = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> index
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> index.argsort()
array([4, 0, 1, 2, 3], dtype=int32)
>>> index.argsort(ascending=False)
array([3, 2, 1, 0, 4], dtype=int32)
asin()

Get Trigonometric inverse sine, element-wise.

The inverse of sine so that, if y = x.sin(), then x = y.asin()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.asin()
0   -1.570796
1    0.000000
2    1.570796
3    0.330314
4    0.523599
dtype: float64

asin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.asin()
      first    second
0 -1.570796  0.236190
1  0.000000  0.304693
2  0.523599  0.100167

asin operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64')
>>> index.asin()
Float64Index([-1.5707963267948966, 0.41151684606748806,
            1.5707963267948966, 0.3046926540153975],
            dtype='float64')
astype(dtype, copy=False)

Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.

Parameters
dtypenumpy dtype

Use a numpy.dtype to cast entire Index object to.

copybool, default False

By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.

Returns
Index

Index with values cast to specified dtype.

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3])
>>> index
Int64Index([1, 2, 3], dtype='int64')
>>> index.astype('float64')
Float64Index([1.0, 2.0, 3.0], dtype='float64')
atan()

Get Trigonometric inverse tangent, element-wise.

The inverse of tan so that, if y = x.tan(), then x = y.atan()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10])
>>> ser
0    -1.00000
1     0.00000
2     1.00000
3     0.32434
4     0.50000
5   -10.00000
dtype: float64
>>> ser.atan()
0   -0.785398
1    0.000000
2    0.785398
3    0.313635
4    0.463648
5   -1.471128
dtype: float64

atan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.atan()
      first    second
0 -0.785398  0.229864
1 -1.471128  0.291457
2  0.463648  1.471128

atan operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.atan()
Float64Index([-0.7853981633974483,  0.3805063771123649,
                            0.7853981633974483, 0.0,
                            0.2914567944778671],
            dtype='float64')
clip(lower=None, upper=None, inplace=False, axis=1)

Trim values at input threshold(s).

Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.

Parameters
lowerscalar or array_like, default None

Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.

upperscalar or array_like, default None

Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.

inplacebool, default False
Returns
Clipped DataFrame/Series/Index/MultiIndex

Examples

>>> import cudf
>>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']})
>>> df.clip(lower=[2, 'b'], upper=[3, 'c'])
   a  b
0  2  b
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=None, upper=[3, 'c'])
   a  b
0  1  a
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=[2, 'b'], upper=None)
   a  b
0  2  b
1  2  b
2  3  c
3  4  d
>>> df.clip(lower=2, upper=3, inplace=True)
>>> df
   a  b
0  2  2
1  2  3
2  3  3
3  3  3
>>> import cudf
>>> sr = cudf.Series([1, 2, 3, 4])
>>> sr.clip(lower=2, upper=3)
0    2
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=None, upper=3)
0    1
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True)
>>> sr
0    2
1    2
2    3
3    4
dtype: int64
copy(names=None, dtype=None, levels=None, codes=None, deep=False, name=None)

Returns copy of MultiIndex object.

Returns a copy of MultiIndex. The levels and codes value can be set to the provided parameters. When they are provided, the returned MultiIndex is always newly constructed.

Parameters
namessequence of objects, optional (default None)

Names for each of the index levels.

dtypeobject, optional (default None)

MultiIndex dtype, only supports None or object type

levelssequence of arrays, optional (default None)

The unique labels for each level. Original values used if None.

codessequence of arrays, optional (default None)

Integers for each level designating which label at each location. Original values used if None.

deepBool (default False)

If True, ._data, ._levels, ._codes will be copied. Ignored if levels or codes are specified.

nameobject, optional (defulat None)

To keep consistent with Index.copy, should not be used.

Returns
Copy of MultiIndex Instance

Examples

>>> df = cudf.DataFrame({'Close': [3400.00, 226.58, 3401.80, 228.91]})
>>> idx1 = cudf.MultiIndex(
... levels=[['2020-08-27', '2020-08-28'], ['AMZN', 'MSFT']],
... codes=[[0, 0, 1, 1], [0, 1, 0, 1]],
... names=['Date', 'Symbol'])
>>> idx2 = idx1.copy(
... levels=[['day1', 'day2'], ['com1', 'com2']],
... codes=[[0, 0, 1, 1], [0, 1, 0, 1]],
... names=['col1', 'col2'])
>>> df.index = idx1
>>> df
                     Close
Date       Symbol
2020-08-27 AMZN    3400.00
           MSFT     226.58
2020-08-28 AMZN    3401.80
           MSFT     228.91
>>> df.index = idx2
>>> df
             Close
col1 col2
day1 com1  3400.00
     com2   226.58
day2 com1  3401.80
     com2   228.91
cos()

Get Trigonometric cosine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.cos()
0    1.000000
1    0.947861
2    0.877583
3    0.525322
4   -0.448074
5   -0.598460
6   -0.283691
dtype: float64

cos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.cos()
      first    second
0  1.000000  0.862319
1  0.283662 -0.283691
2 -0.839072 -0.839039
3 -0.759688 -0.022097

cos operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.cos()
Float64Index([ 0.9210609940028851,  0.8623188722876839,
            -0.5984600690578581, -0.4480736161291701],
            dtype='float64')
difference(other, sort=None)

Return a new Index with elements from the index that are not in other.

This is the set difference of two Index objects.

Parameters
otherIndex or array-like
sortFalse or None, default None

Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.

  • None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.

  • False : Do not sort the result.

Returns
differenceIndex

Examples

>>> import cudf
>>> idx1 = cudf.Index([2, 1, 3, 4])
>>> idx1
Int64Index([2, 1, 3, 4], dtype='int64')
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx2
Int64Index([3, 4, 5, 6], dtype='int64')
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
drop_duplicates(keep='first')

Return Index with duplicate values removed

Parameters
keep{‘first’, ‘last’, False}, default ‘first’
  • ‘first’Drop duplicates except for the

    first occurrence.

  • ‘last’Drop duplicates except for the

    last occurrence.

  • False : Drop all duplicates.

Returns
deduplicatedIndex

Examples

>>> import cudf
>>> idx = cudf.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
>>> idx
StringIndex(['lama' 'cow' 'lama' 'beetle' 'lama' 'hippo'], dtype='object')
>>> idx.drop_duplicates()
StringIndex(['beetle' 'cow' 'hippo' 'lama'], dtype='object')
droplevel(level=- 1)

Removes the specified levels from the MultiIndex.

Parameters
levellevel name or index, list-like

Integer, name or list of such, specifying one or more levels to drop from the MultiIndex

Returns
A MultiIndex or Index object, depending on the number of remaining
levels.

Examples

>>> import cudf
>>> idx = cudf.MultiIndex.from_frame(
...     cudf.DataFrame(
...         {
...             "first": ["a", "a", "a", "b", "b", "b"],
...             "second": [1, 1, 2, 2, 3, 3],
...             "third": [0, 1, 2, 0, 1, 2],
...         }
...     )
... )

Dropping level by index:

>>> idx.droplevel(0)
MultiIndex([(1, 0),
            (1, 1),
            (2, 2),
            (2, 0),
            (3, 1),
            (3, 2)],
           names=['second', 'third'])

Dropping level by name:

>>> idx.droplevel("first")
MultiIndex([(1, 0),
            (1, 1),
            (2, 2),
            (2, 0),
            (3, 1),
            (3, 2)],
           names=['second', 'third'])

Dropping multiple levels:

>>> idx.droplevel(["first", "second"])
Int64Index([0, 1, 2, 0, 1, 2], dtype='int64', name='third')
dropna(how='any')

Return an Index with null values removed.

Parameters
how{‘any’, ‘all’}, default ‘any’

If the Index is a MultiIndex, drop the value when any or all levels are NaN.

Returns
validIndex

Examples

>>> import cudf
>>> index = cudf.Index(['a', None, 'b', 'c'])
>>> index
StringIndex(['a' None 'b' 'c'], dtype='object')
>>> index.dropna()
StringIndex(['a' 'b' 'c'], dtype='object')

Using dropna on a MultiIndex:

>>> midx = cudf.MultiIndex(
...         levels=[[1, None, 4, None], [1, 2, 5]],
...         codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...         names=["x", "y"],
...     )
>>> midx
MultiIndex([(   1, 1),
            (   1, 5),
            (<NA>, 2),
            (   4, 2),
            (<NA>, 1)],
           names=['x', 'y'])
>>> midx.dropna()
MultiIndex([(1, 1),
            (1, 5),
            (4, 2)],
           names=['x', 'y'])
property empty

Indicator whether Index is empty.

True if Index is entirely empty (no items).

Returns
outbool

If Index is empty, return True, if not return False.

Examples

>>> import cudf
>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.empty
True
equals(other, **kwargs)

Determine if two Index objects contain the same elements.

Returns
out: bool

True if “other” is an Index and it has the same elements as calling index; False otherwise.

exp()

Get the exponential of all elements, element-wise.

Exponential is the inverse of the log function, so that x.exp().log() = x

Returns
DataFrame/Series/Index

Result of the element-wise exponential.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.exp()
0    3.678794e-01
1    1.000000e+00
2    2.718282e+00
3    1.383117e+00
4    1.648721e+00
5    4.539993e-05
6    2.688117e+43
dtype: float64

exp operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.exp()
      first        second
0  0.367879      1.263644
1  0.000045      1.349859
2  1.648721  22026.465795

exp operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.exp()
Float64Index([0.36787944117144233,  1.4918246976412703,
              2.718281828459045, 1.0,  1.3498588075760032],
            dtype='float64')
factorize(na_sentinel=- 1)

Encode the input values as integer labels

See also

cudf.core.series.Series.factorize

Encode the input values of Series.

fillna(value)

Fill null values with the specified value.

Parameters
valuescalar

Scalar value to use to fill nulls. This value cannot be a list-likes.

Returns
filledMultiIndex

Examples

>>> import cudf
>>> index = cudf.MultiIndex(
...         levels=[["a", "b", "c", None], ["1", None, "5"]],
...         codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...         names=["x", "y"],
...       )
>>> index
MultiIndex([( 'a',  '1'),
            ( 'a',  '5'),
            ( 'b', <NA>),
            ( 'c', <NA>),
            (<NA>,  '1')],
           names=['x', 'y'])
>>> index.fillna('hello')
MultiIndex([(    'a',     '1'),
            (    'a',     '5'),
            (    'b', 'hello'),
            (    'c', 'hello'),
            ('hello',     '1')],
           names=['x', 'y'])
classmethod from_arrow(table)

Convert PyArrow Table to MultiIndex

Parameters
tablePyArrow Table

PyArrow Object which has to be converted to MultiIndex

Returns
cudf MultiIndex

Examples

>>> import cudf
>>> import pyarrow as pa
>>> tbl = pa.table({"a":[1, 2, 3], "b":["a", "b", "c"]})
>>> cudf.MultiIndex.from_arrow(tbl)
MultiIndex([(1, 'a'),
            (2, 'b'),
            (3, 'c')],
           names=['a', 'b'])
classmethod from_pandas(multiindex, nan_as_null=None)

Convert from a Pandas MultiIndex

Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pandas as pd
>>> pmi = pd.MultiIndex(levels=[['a', 'b'], ['c', 'd']],
...                     codes=[[0, 1], [1, 1]])
>>> cudf.from_pandas(pmi)
MultiIndex([('a', 'd'),
            ('b', 'd')],
           )
get_level_values(level)

Return the values at the requested level

Parameters
levelint or label
Returns
An Index containing the values at the requested level.
get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if side=='right') position of given label.

Parameters
labelobject
side{‘left’, ‘right’}
kind{‘ix’, ‘loc’, ‘getitem’}
Returns
int

Index of label.

property gpu_values

View the data as a numba device array object

interleave_columns()

Interleave Series columns of a table into a single column.

Converts the column major table cols into a row major column.

Parameters
colsinput Table containing columns to interleave.
Returns
The interleaved columns as a single column

Examples

>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']])
>>> df
0    [A1, A2, A3]
1    [B1, B2, B3]
>>> df.interleave_columns()
0    A1
1    B1
2    A2
3    B2
4    A3
5    B3
property is_monotonic

Alias for is_monotonic_increasing.

property is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

property is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

property is_unique

Return if the index has unique values.

isin(values, level=None)

Return a boolean array where the index values are in values.

Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.

Parameters
valuesset, list-like, Index or Multi-Index

Sought values.

levelstr or int, optional

Name or position of the index level to use (if the index is a MultiIndex).

Returns
is_containedcupy array

CuPy array of boolean values.

Notes

When level is None, values can only be MultiIndex, or a set/list-like tuples. When level is provided, values can be Index or MultiIndex, or a set/list-like tuples.

Examples

>>> import cudf
>>> import pandas as pd
>>> midx = cudf.from_pandas(pd.MultiIndex.from_arrays([[1,2,3],
...                                  ['red', 'blue', 'green']],
...                                  names=('number', 'color')))
>>> midx
MultiIndex([(1,   'red'),
            (2,  'blue'),
            (3, 'green')],
           names=['number', 'color'])

Check whether the strings in the ‘color’ level of the MultiIndex are in a list of colors.

>>> midx.isin(['red', 'orange', 'yellow'], level='color')
array([ True, False, False])

To check across the levels of a MultiIndex, pass a list of tuples:

>>> midx.isin([(1, 'red'), (3, 'red')])
array([ True, False, False])
isna()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
isnull()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
join(other, how='left', level=None, return_indexers=False, sort=False)

Compute join_index and indexers to conform data structures to the new index.

Parameters
otherIndex.
how{‘left’, ‘right’, ‘inner’, ‘outer’}
return_indexersbool, default False
sortbool, default False

Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).

Returns: index

Examples

>>> import cudf
>>> lhs = cudf.DataFrame(
...     {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b']
... ).index
>>> lhs
MultiIndex([(2, 3),
            (3, 4),
            (1, 2)],
           names=['a', 'b'])
>>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index
>>> rhs
Int64Index([1, 4, 3], dtype='int64', name='a')
>>> lhs.join(rhs, how='inner')
MultiIndex([(3, 4),
            (1, 2)],
           names=['a', 'b'])
log()

Get the natural logarithm of all elements, element-wise.

Natural logarithm is the inverse of the exp function, so that x.log().exp() = x

Returns
DataFrame/Series/Index

Result of the element-wise natural logarithm.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.log()
0         NaN
1        -inf
2    0.000000
3   -1.125963
4   -0.693147
5         NaN
6    4.605170
dtype: float64

log operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.log()
      first    second
0       NaN -1.452434
1       NaN -1.203973
2 -0.693147  2.302585

log operation on Index:

>>> index = cudf.Index([10, 11, 500.0])
>>> index
Float64Index([10.0, 11.0, 500.0], dtype='float64')
>>> index.log()
Float64Index([2.302585092994046, 2.3978952727983707,
            6.214608098422191], dtype='float64')
mask(cond, other=None, inplace=False)

Replace values where the condition is True.

Parameters
condbool Series/DataFrame, array-like

Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.

other: scalar, list of scalars, Series/DataFrame

Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.

DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.

Series expects only scalar or series like with same length

inplacebool, default False

Whether to perform the operation in place on the data.

Returns
Same type as caller

Examples

>>> import cudf
>>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]})
>>> df.mask(df % 2 == 0, [-1, -1])
   A  B
0  1  3
1 -1  5
2  5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.mask(ser > 2, 10)
0    10
1    10
2     2
3     1
4     0
dtype: int64
>>> ser.mask(ser > 2)
0    <NA>
1    <NA>
2       2
3       1
4       0
dtype: int64
max()

Return the maximum value of the Index.

Returns
scalar

Maximum value.

See also

cudf.core.index.Index.min

Return the minimum value in an Index.

cudf.core.series.Series.max

Return the maximum value in a Series.

cudf.core.dataframe.DataFrame.max

Return the maximum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.max()
3
memory_usage(deep=False)

Memory usage of the values.

Parameters
deepbool

Introspect the data deeply, interrogate object dtypes for system-level memory consumption.

Returns
bytes used
min()

Return the minimum value of the Index.

Returns
scalar

Minimum value.

See also

cudf.core.index.Index.max

Return the maximum value in an Index.

cudf.core.series.Series.min

Return the minimum value in a Series.

cudf.core.dataframe.DataFrame.min

Return the minimum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.min()
1
property name

Returns the name of the Index.

property names

Returns a tuple containing the name of the Index.

property ndim

Dimension of the data. For MultiIndex ndim is always 2.

property nlevels

Integer number of levels in this MultiIndex.

notna()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
notnull()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

Parameters
funcfunction

Function to apply to the Series/DataFrame/Index. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame/Index.

argsiterable, optional

Positional arguments passed into func.

kwargsmapping, optional

A dictionary of keyword arguments passed into func.

Returns
objectthe return type of func.

Examples

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. Instead of writing

>>> func(g(h(df), arg1=a), arg2=b, arg3=c)

You can write

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe(func, arg2=b, arg3=c)
... )

If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose f takes its data as arg2:

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe((func, 'arg2'), arg1=a, arg3=c)
...  )
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)

Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.

Parameters
axis{0 or ‘index’, 1 or ‘columns’}, default 0

Index to direct ranking.

method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’

How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.

numeric_onlybool, optional

For DataFrame objects, rank only numeric columns if set to True.

na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’

How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.

ascendingbool, default True

Whether or not the elements should be ranked in ascending order.

pctbool, default False

Whether or not to display the returned rankings in percentile form.

Returns
same type as caller

Return a Series or DataFrame with data ranks as values.

rename(names, inplace=False)

Alter MultiIndex level names

Parameters
nameslist of label

Names to set, length must be the same as number of levels

inplacebool, default False

If True, modifies objects directly, otherwise returns a new MultiIndex instance

Returns
None or MultiIndex

Examples

Renaming each levels of a MultiIndex to specified name:

>>> midx = cudf.MultiIndex.from_product(
        [('A', 'B'), (2020, 2021)], names=['c1', 'c2'])
>>> midx.rename(['lv1', 'lv2'])
MultiIndex([('A', 2020),
            ('A', 2021),
            ('B', 2020),
            ('B', 2021)],
        names=['lv1', 'lv2'])
>>> midx.rename(['lv1', 'lv2'], inplace=True)
>>> midx
MultiIndex([('A', 2020),
            ('A', 2021),
            ('B', 2020),
            ('B', 2021)],
        names=['lv1', 'lv2'])

names argument must be a list, and must have same length as MultiIndex.levels:

>>> midx.rename(['lv0'])
Traceback (most recent call last):
ValueError: Length of names must match number of levels in MultiIndex.
repeat(repeats, axis=None)

Repeats elements consecutively.

Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.

Parameters
repeatsint, or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.

Returns
Series/DataFrame/Index

A newly created object of same type as caller with repeated elements.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]})
>>> df
   a   b
0  1  10
1  2  20
2  3  30
>>> df.repeat(3)
   a   b
0  1  10
0  1  10
0  1  10
1  2  20
1  2  20
1  2  20
2  3  30
2  3  30
2  3  30

Repeat on Series

>>> s = cudf.Series([0, 2])
>>> s
0    0
1    2
dtype: int64
>>> s.repeat([3, 4])
0    0
0    0
0    0
1    2
1    2
1    2
1    2
dtype: int64
>>> s.repeat(2)
0    0
0    0
1    2
1    2
dtype: int64

Repeat on Index

>>> index = cudf.Index([10, 22, 33, 55])
>>> index
Int64Index([10, 22, 33, 55], dtype='int64')
>>> index.repeat(5)
Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33,
            33, 33, 33, 33, 55, 55, 55, 55, 55],
        dtype='int64')
round(decimals=0)

Round a DataFrame to a variable number of decimal places.

Parameters
decimalsint, dict, Series

Number of decimal places to round each column to. If an int is given, round each column to the same number of places. Otherwise dict and Series round to variable numbers of places. Column names should be in the keys if decimals is a dict-like, or in the index if decimals is a Series. Any columns not included in decimals will be left as is. Elements of decimals which are not columns of the input will be ignored.

Returns
DataFrame

A DataFrame with the affected columns rounded to the specified number of decimal places.

Examples

>>> df = cudf.DataFrame(
        [(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
...     columns=['dogs', 'cats']
... )
>>> df
    dogs  cats
0  0.21  0.32
1  0.01  0.67
2  0.66  0.03
3  0.21  0.18

By providing an integer each column is rounded to the same number of decimal places

>>> df.round(1)
    dogs  cats
0   0.2   0.3
1   0.0   0.7
2   0.7   0.0
3   0.2   0.2

With a dict, the number of places for specific columns can be specified with the column names as key and the number of decimal places as value

>>> df.round({'dogs': 1, 'cats': 0})
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

Using a Series, the number of places for specific columns can be specified with the column names as index and the number of decimal places as value

>>> decimals = cudf.Series([0, 1], index=['cats', 'dogs'])
>>> df.round(decimals)
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters
nint, optional

Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

fracfloat, optional

Fraction of axis items to return. Cannot be used with n.

replacebool, default False

Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”

weightsstr or ndarray-like, optional

Only supported for axis=1/”columns”

random_stateint, numpy RandomState or None, default None

Seed for the random number generator (if int), or None. If None, a random seed will be chosen. if RandomState, seed will be extracted from current state.

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.

Returns
Series or DataFrame or Index

A new object of same type as caller containing n items randomly sampled from the caller object.

Examples

>>> import cudf as cudf
>>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}})
>>> df.sample(3)
   a
1  2
3  4
0  1
>>> sr = cudf.Series([1, 2, 3, 4, 5])
>>> sr.sample(10, replace=True)
1    4
3    1
2    4
0    5
0    1
4    5
4    1
0    2
0    3
3    2
dtype: int64
>>> df = cudf.DataFrame(
... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]})
>>> df.sample(2, axis=1)
   a  c
0  1  3
1  2  4
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)

Scatter to a list of dataframes.

Uses map_index to determine the destination of each row of the original DataFrame.

Parameters
map_indexSeries, str or list-like

Scatter assignment for each row

map_sizeint

Length of output list. Must be >= uniques in map_index

keep_indexbool

Conserve original index values for each row

Returns
A list of cudf.DataFrame objects.
searchsorted(values, side='left', ascending=True, na_position='last')

Find indices where elements should be inserted to maintain order

Parameters
valueFrame (Shape must be consistent with self)

Values to be hypothetically inserted into Self

sidestr {‘left’, ‘right’} optional, default ‘left‘

If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index

ascendingbool optional, default True

Sorted Frame is in ascending order (otherwise descending)

na_positionstr {‘last’, ‘first’} optional, default ‘last‘

Position of null values in sorted order

Returns
1-D cupy array of insertion points

Examples

>>> s = cudf.Series([1, 2, 3])
>>> s.searchsorted(4)
3
>>> s.searchsorted([0, 4])
array([0, 3], dtype=int32)
>>> s.searchsorted([1, 3], side='left')
array([0, 2], dtype=int32)
>>> s.searchsorted([1, 3], side='right')
array([1, 3], dtype=int32)

If the values are not monotonically sorted, wrong locations may be returned:

>>> s = cudf.Series([2, 1, 3])
>>> s.searchsorted(1)
0   # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]})
>>> df
   a   b
0  1  10
1  3  12
2  5  14
3  7  16
>>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6],
... 'b': [10, 11, 13, 15]})
>>> values_df
   a   b
0  0  10
1  2  17
2  5  13
3  6  15
>>> df.searchsorted(values_df, ascending=False)
array([4, 4, 4, 0], dtype=int32)
set_names(names, level=None, inplace=False)

Set Index or MultiIndex name. Able to set new names partially and by level.

Parameters
nameslabel or list of label

Name(s) to set.

levelint, label or list of int or label, optional

If the index is a MultiIndex, level(s) to set (None for all levels). Otherwise level must be None.

inplacebool, default False

Modifies the object directly, instead of creating a new Index or MultiIndex.

Returns
Index

The same type as the caller or None if inplace is True.

See also

cudf.core.index.Index.rename

Able to set new names without level.

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = cudf.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
            ('python', 2019),
            ( 'cobra', 2018),
            ( 'cobra', 2019)],
           )
>>> idx.names
FrozenList([None, None])
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx.names
FrozenList(['kind', 'year'])
>>> idx.set_names('species', level=0, inplace=True)
>>> idx.names
FrozenList(['species', 'year'])
property shape

Returns a tuple representing the dimensionality of the Index.

shift(periods=1, freq=None, axis=0, fill_value=None)

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.sin()
0    0.000000
1    0.318683
2    0.479426
3    0.850904
4    0.893997
5   -0.801153
6    0.958916
dtype: float64

sin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.sin()
      first    second
0  0.000000 -0.506366
1 -0.958924  0.958916
2 -0.544021 -0.544072
3  0.650288 -0.999756

sin operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.sin()
Float64Index([-0.3894183423086505, -0.5063656411097588,
            0.8011526357338306, 0.8939966636005579],
            dtype='float64')
property size

Return the number of elements in the underlying data.

Returns
sizeSize of the DataFrame / Index / Series / MultiIndex

Examples

Size of an empty dataframe is 0.

>>> import cudf
>>> df = cudf.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>> df.size
0
>>> df = cudf.DataFrame(index=[1, 2, 3])
>>> df
Empty DataFrame
Columns: []
Index: [1, 2, 3]
>>> df.size
0

DataFrame with values

>>> df = cudf.DataFrame({'a': [10, 11, 12],
...         'b': ['hello', 'rapids', 'ai']})
>>> df
    a       b
0  10   hello
1  11  rapids
2  12      ai
>>> df.size
6
>>> df.index
RangeIndex(start=0, stop=3)
>>> df.index.size
3

Size of an Index

>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.size
0
>>> index = cudf.Index([1, 2, 3, 10])
>>> index
Int64Index([1, 2, 3, 10], dtype='int64')
>>> index.size
4

Size of a MultiIndex

>>> midx = cudf.MultiIndex(
...                 levels=[["a", "b", "c", None], ["1", None, "5"]],
...                 codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...                 names=["x", "y"],
...             )
>>> midx
MultiIndex([( 'a',  '1'),
            ( 'a',  '5'),
            ( 'b', <NA>),
            ( 'c', <NA>),
            (<NA>,  '1')],
           names=['x', 'y'])
>>> midx.size
5
sort_values(return_indexer=False, ascending=True, key=None)

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

Parameters
return_indexerbool, default False

Should the indices that would sort the index be returned.

ascendingbool, default True

Should the index values be sorted in an ascending order.

keyNone, optional

This parameter is NON-FUNCTIONAL.

Returns
sorted_indexIndex

Sorted copy of the index.

indexercupy.ndarray, optional

The indices that the index itself was sorted by.

See also

cudf.core.series.Series.min

Sort values of a Series.

cudf.core.dataframe.DataFrame.sort_values

Sort values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')

Sort values in ascending order (default behavior).

>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')

Sort values in descending order, and also get the indices idx was sorted by.

>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2],
                                                    dtype=int32))

Sorting values in a MultiIndex:

>>> midx = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> midx
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> midx.sort_values()
MultiIndex([(-10,  1),
            (  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11)],
           names=['x', 'y'])
>>> midx.sort_values(ascending=False)
MultiIndex([(  4, 11),
            (  3, 11),
            (  1,  5),
            (  1,  1),
            (-10,  1)],
           names=['x', 'y'])
sqrt()

Get the non-negative square-root of all elements, element-wise.

Returns
DataFrame/Series/Index

Result of the non-negative square-root of each element.

Examples

>>> import cudf
>>> import cudf
>>> ser = cudf.Series([10, 25, 81, 1.0, 100])
>>> ser
0     10.0
1     25.0
2     81.0
3      1.0
4    100.0
dtype: float64
>>> ser.sqrt()
0     3.162278
1     5.000000
2     9.000000
3     1.000000
4    10.000000
dtype: float64

sqrt operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-10.0, 100, 625],
...                      'second': [1, 2, 0.4]})
>>> df
   first  second
0  -10.0     1.0
1  100.0     2.0
2  625.0     0.4
>>> df.sqrt()
   first    second
0    NaN  1.000000
1   10.0  1.414214
2   25.0  0.632456

sqrt operation on Index:

>>> index = cudf.Index([-10.0, 100, 625])
>>> index
Float64Index([-10.0, 100.0, 625.0], dtype='float64')
>>> index.sqrt()
Float64Index([nan, 10.0, 25.0], dtype='float64')
sum()

Return the sum of all values of the Index.

Returns
scalar

Sum of all values.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.sum()
6
take(indices)

Gather only the specific subset of indices

Parameters
indices: An array-like that maps to values contained in this Index.
tan()

Get Trigonometric tangent, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.tan()
0    0.000000
1    0.336213
2    0.546302
3    1.619775
4   -1.995200
5    1.338690
6   -3.380140
dtype: float64

tan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.tan()
      first     second
0  0.000000  -0.587214
1 -3.380515  -3.380140
2  0.648361   0.648446
3 -0.855993  45.244742

tan operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.tan()
Float64Index([-0.4227932187381618,  -0.587213915156929,
            -1.3386902103511544, -1.995200412208242],
            dtype='float64')
tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

Parameters
selfinput Table containing columns to interleave.
countNumber of times to tile “rows”. Must be non-negative.
Returns
The table containing the tiled “rows”.

Examples

>>> df  = Dataframe([[8, 4, 7], [5, 2, 3]])
>>> count = 2
>>> df.tile(df, count)
   0  1  2
0  8  4  7
1  5  2  3
0  8  4  7
1  5  2  3
to_array(fillna=None)

Get a dense numpy array for the data.

Parameters
fillnastr or None

Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.

Notes

if fillna is None, null values are skipped. Therefore, the output size could be smaller.

to_arrow()

Convert MultiIndex to PyArrow Table

Returns
PyArrow Table

Examples

>>> import cudf
>>> df = cudf.DataFrame({"a":[1, 2, 3], "b":[2, 3, 4]})
>>> mindex = cudf.Index(df)
>>> mindex
MultiIndex([(1, 2),
            (2, 3),
            (3, 4)],
           names=['a', 'b'])
>>> mindex.to_arrow()
pyarrow.Table
a: int64
b: int64
>>> mindex.to_arrow()['a']
<pyarrow.lib.ChunkedArray object at 0x7f5c6b71fad0>
[
    [
        1,
        2,
        3
    ]
]
to_dlpack()

Converts a cuDF object into a DLPack tensor.

DLPack is an open-source memory tensor structure: dmlc/dlpack.

This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.

Parameters
cudf_objDataFrame, Series, Index, or Column
Returns
pycapsule_objPyCapsule

Output DLPack tensor pointer which is encapsulated in a PyCapsule object.

to_pandas(nullable=False, **kwargs)

Convert to a Pandas Index.

Examples

>>> import cudf
>>> idx = cudf.Index([-3, 10, 15, 20])
>>> idx
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> idx.to_pandas()
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> type(idx.to_pandas())
<class 'pandas.core.indexes.numeric.Int64Index'>
>>> type(idx)
<class 'cudf.core.index.GenericIndex'>
to_series(index=None, name=None)

Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.

Parameters
indexIndex, optional

Index of resulting Series. If None, defaults to original index.

namestr, optional

Dame of resulting Series. If None, defaults to name of original index.

Returns
Series

The dtype will be based on the type of the Index values.

unique()

Return unique values in the index.

Returns
Index without duplicates
property values

Return a CuPy representation of the MultiIndex.

Only the values in the MultiIndex will be returned.

Returns
out: cupy.ndarray

The values of the MultiIndex.

Examples

>>> import cudf
>>> midx = cudf.MultiIndex(
...         levels=[[1, 3, 4, 5], [1, 2, 5]],
...         codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...         names=["x", "y"],
...     )
>>> midx.values
array([[1, 1],
    [1, 5],
    [3, 2],
    [4, 2],
    [5, 1]])
>>> type(midx.values)
<class 'cupy.core.core.ndarray'>
property values_host

Return a numpy representation of the MultiIndex.

Only the values in the MultiIndex will be returned.

Returns
outnumpy.ndarray

The values of the MultiIndex.

Examples

>>> import cudf
>>> midx = cudf.MultiIndex(
...         levels=[[1, 3, 4, 5], [1, 2, 5]],
...         codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...         names=["x", "y"],
...     )
>>> midx.values_host
array([(1, 1), (1, 5), (3, 2), (4, 2), (5, 1)], dtype=object)
>>> type(midx.values_host)
<class 'numpy.ndarray'>
where(cond, other=None, inplace=False)

Replace values where the condition is False.

Parameters
condbool array-like with the same length as self

Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.

other: scalar, or array-like

Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.

Returns
Same type as caller

Examples

>>> import cudf
>>> index = cudf.Index([4, 3, 2, 1, 0])
>>> index
Int64Index([4, 3, 2, 1, 0], dtype='int64')
>>> index.where(index > 2, 15)
Int64Index([4, 3, 15, 15, 15], dtype='int64')

Int8Index

class cudf.core.index.Int8Index(data=None, dtype=None, copy=False, name=None)

Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.

Parameters
dataarray-like (1-dimensional)/ DataFrame

If it is a DataFrame, it will return a MultiIndex

dtypeNumPy dtype (default: object)

If dtype is None, we find the dtype that best fits the data.

copybool

Make a copy of input data.

nameobject

Name to be stored in the index.

tupleize_colsbool (default: True)

When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.

Returns
Index

cudf Index

Examples

>>> import cudf
>>> cudf.Index([1, 2, 3], dtype="uint64", name="a")
UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]}))
MultiIndex([(1, 2),
            (2, 3)],
          names=['a', 'b'])
Attributes
dtype

dtype of the underlying values in GenericIndex.

empty

Indicator whether Index is empty.

gpu_values

View the data as a numba device array object

is_monotonic

Alias for is_monotonic_increasing.

is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

is_unique

Return if the index has unique values.

name

Returns the name of the Index.

names

Returns a tuple containing the name of the Index.

ndim

Dimension of the data.

nlevels

Number of levels.

shape

Returns a tuple representing the dimensionality of the Index.

size

Return the number of elements in the underlying data.

values

Return an array representing the data in the Index.

values_host

Return a numpy representation of the Index.

Methods

acos()

Get Trigonometric inverse cosine, element-wise.

any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

argsort([ascending])

Return the integer indices that would sort the index.

asin()

Get Trigonometric inverse sine, element-wise.

astype(dtype[, copy])

Create an Index with values cast to dtypes.

atan()

Get Trigonometric inverse tangent, element-wise.

clip([lower, upper, inplace, axis])

Trim values at input threshold(s).

copy([name, deep, dtype, names])

Make a copy of this object.

cos()

Get Trigonometric cosine, element-wise.

difference(other[, sort])

Return a new Index with elements from the index that are not in other.

drop_duplicates([keep])

Return Index with duplicate values removed

dropna([how])

Return an Index with null values removed.

equals(other, **kwargs)

Determine if two Index objects contain the same elements.

exp()

Get the exponential of all elements, element-wise.

factorize([na_sentinel])

Encode the input values as integer labels

fillna(value[, downcast])

Fill null values with the specified value.

find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

from_pandas(index[, nan_as_null])

Convert from a Pandas Index.

get_level_values(level)

Return an Index of values for requested level.

get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label.

interleave_columns()

Interleave Series columns of a table into a single column.

isin(values)

Return a boolean array where the index values are in values.

isna()

Identify missing values.

isnull()

Identify missing values.

join(other[, how, level, return_indexers, sort])

Compute join_index and indexers to conform data structures to the new index.

log()

Get the natural logarithm of all elements, element-wise.

mask(cond[, other, inplace])

Replace values where the condition is True.

max()

Return the maximum value of the Index.

memory_usage([deep])

Memory usage of the values.

min()

Return the minimum value of the Index.

notna()

Identify non-missing values.

notnull()

Identify non-missing values.

pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

rank([axis, method, numeric_only, …])

Compute numerical data ranks (1 through n) along axis.

rename(name[, inplace])

Alter Index name.

repeat(repeats[, axis])

Repeats elements consecutively.

round([decimals])

Round a DataFrame to a variable number of decimal places.

sample([n, frac, replace, weights, …])

Return a random sample of items from an axis of object.

scatter_by_map(map_index[, map_size, keep_index])

Scatter to a list of dataframes.

searchsorted(values[, side, ascending, …])

Find indices where elements should be inserted to maintain order

set_names(names[, level, inplace])

Set Index or MultiIndex name.

shift([periods, freq, axis, fill_value])

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

sort_values([return_indexer, ascending, key])

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

sqrt()

Get the non-negative square-root of all elements, element-wise.

sum()

Return the sum of all values of the Index.

take(indices)

Gather only the specific subset of indices

tan()

Get Trigonometric tangent, element-wise.

tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

to_array([fillna])

Get a dense numpy array for the data.

to_arrow()

Convert Index to PyArrow Array

to_dlpack()

Converts a cuDF object into a DLPack tensor.

to_frame([index, name])

Create a DataFrame with a column containing this Index

to_pandas()

Convert to a Pandas Index.

to_series([index, name])

Create a Series with both index and values equal to the index keys.

unique()

Return unique values in the index.

where(cond[, other])

Replace values where the condition is False.

replace

acos()

Get Trigonometric inverse cosine, element-wise.

The inverse of cos so that, if y = x.cos(), then x = y.acos()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.acos()
0    3.141593
1    1.570796
2    0.000000
3    1.240482
4    1.047198
dtype: float64

acos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.acos()
      first    second
0  3.141593  1.334606
1  1.570796  1.266104
2  1.047198  1.470629

acos operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.acos()
Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0,
            1.5707963267948966,  1.266103672779499],
            dtype='float64')
any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

Parameters
otherIndex or list/tuple of indices
Returns
appendedIndex

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 10, 100])
>>> idx
Int64Index([1, 2, 10, 100], dtype='int64')
>>> other = cudf.Index([200, 400, 50])
>>> other
Int64Index([200, 400, 50], dtype='int64')
>>> idx.append(other)
Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')

append accepts list of Index objects

>>> idx.append([other, other])
Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
argsort(ascending=True, **kwargs)

Return the integer indices that would sort the index.

Parameters
ascendingbool, default True

If True, returns the indices for ascending order. If False, returns the indices for descending order.

Returns
arrayA cupy array containing Integer indices that

would sort the index if used as an indexer.

Examples

>>> import cudf
>>> index = cudf.Index([10, 100, 1, 1000])
>>> index
Int64Index([10, 100, 1, 1000], dtype='int64')
>>> index.argsort()
array([2, 0, 1, 3], dtype=int32)

The order of argsort can be reversed using ascending parameter, by setting it to False. >>> index.argsort(ascending=False) array([3, 1, 0, 2], dtype=int32)

argsort on a MultiIndex:

>>> index = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> index
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> index.argsort()
array([4, 0, 1, 2, 3], dtype=int32)
>>> index.argsort(ascending=False)
array([3, 2, 1, 0, 4], dtype=int32)
asin()

Get Trigonometric inverse sine, element-wise.

The inverse of sine so that, if y = x.sin(), then x = y.asin()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.asin()
0   -1.570796
1    0.000000
2    1.570796
3    0.330314
4    0.523599
dtype: float64

asin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.asin()
      first    second
0 -1.570796  0.236190
1  0.000000  0.304693
2  0.523599  0.100167

asin operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64')
>>> index.asin()
Float64Index([-1.5707963267948966, 0.41151684606748806,
            1.5707963267948966, 0.3046926540153975],
            dtype='float64')
astype(dtype, copy=False)

Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.

Parameters
dtypenumpy dtype

Use a numpy.dtype to cast entire Index object to.

copybool, default False

By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.

Returns
Index

Index with values cast to specified dtype.

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3])
>>> index
Int64Index([1, 2, 3], dtype='int64')
>>> index.astype('float64')
Float64Index([1.0, 2.0, 3.0], dtype='float64')
atan()

Get Trigonometric inverse tangent, element-wise.

The inverse of tan so that, if y = x.tan(), then x = y.atan()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10])
>>> ser
0    -1.00000
1     0.00000
2     1.00000
3     0.32434
4     0.50000
5   -10.00000
dtype: float64
>>> ser.atan()
0   -0.785398
1    0.000000
2    0.785398
3    0.313635
4    0.463648
5   -1.471128
dtype: float64

atan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.atan()
      first    second
0 -0.785398  0.229864
1 -1.471128  0.291457
2  0.463648  1.471128

atan operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.atan()
Float64Index([-0.7853981633974483,  0.3805063771123649,
                            0.7853981633974483, 0.0,
                            0.2914567944778671],
            dtype='float64')
clip(lower=None, upper=None, inplace=False, axis=1)

Trim values at input threshold(s).

Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.

Parameters
lowerscalar or array_like, default None

Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.

upperscalar or array_like, default None

Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.

inplacebool, default False
Returns
Clipped DataFrame/Series/Index/MultiIndex

Examples

>>> import cudf
>>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']})
>>> df.clip(lower=[2, 'b'], upper=[3, 'c'])
   a  b
0  2  b
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=None, upper=[3, 'c'])
   a  b
0  1  a
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=[2, 'b'], upper=None)
   a  b
0  2  b
1  2  b
2  3  c
3  4  d
>>> df.clip(lower=2, upper=3, inplace=True)
>>> df
   a  b
0  2  2
1  2  3
2  3  3
3  3  3
>>> import cudf
>>> sr = cudf.Series([1, 2, 3, 4])
>>> sr.clip(lower=2, upper=3)
0    2
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=None, upper=3)
0    1
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True)
>>> sr
0    2
1    2
2    3
3    4
dtype: int64
copy(name=None, deep=False, dtype=None, names=None)

Make a copy of this object.

Parameters
nameobject, default None

Name of index, use original name when None

deepbool, default True

Make a deep copy of the data. With deep=False the original data is used

dtypenumpy dtype, default None

Target datatype to cast into, use original dtype when None

nameslist-like, default False

Kept compatibility with MultiIndex. Should not be used.

Returns
New index instance, casted to new dtype
cos()

Get Trigonometric cosine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.cos()
0    1.000000
1    0.947861
2    0.877583
3    0.525322
4   -0.448074
5   -0.598460
6   -0.283691
dtype: float64

cos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.cos()
      first    second
0  1.000000  0.862319
1  0.283662 -0.283691
2 -0.839072 -0.839039
3 -0.759688 -0.022097

cos operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.cos()
Float64Index([ 0.9210609940028851,  0.8623188722876839,
            -0.5984600690578581, -0.4480736161291701],
            dtype='float64')
difference(other, sort=None)

Return a new Index with elements from the index that are not in other.

This is the set difference of two Index objects.

Parameters
otherIndex or array-like
sortFalse or None, default None

Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.

  • None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.

  • False : Do not sort the result.

Returns
differenceIndex

Examples

>>> import cudf
>>> idx1 = cudf.Index([2, 1, 3, 4])
>>> idx1
Int64Index([2, 1, 3, 4], dtype='int64')
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx2
Int64Index([3, 4, 5, 6], dtype='int64')
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
drop_duplicates(keep='first')

Return Index with duplicate values removed

Parameters
keep{‘first’, ‘last’, False}, default ‘first’
  • ‘first’Drop duplicates except for the

    first occurrence.

  • ‘last’Drop duplicates except for the

    last occurrence.

  • False : Drop all duplicates.

Returns
deduplicatedIndex

Examples

>>> import cudf
>>> idx = cudf.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
>>> idx
StringIndex(['lama' 'cow' 'lama' 'beetle' 'lama' 'hippo'], dtype='object')
>>> idx.drop_duplicates()
StringIndex(['beetle' 'cow' 'hippo' 'lama'], dtype='object')
dropna(how='any')

Return an Index with null values removed.

Parameters
how{‘any’, ‘all’}, default ‘any’

If the Index is a MultiIndex, drop the value when any or all levels are NaN.

Returns
validIndex

Examples

>>> import cudf
>>> index = cudf.Index(['a', None, 'b', 'c'])
>>> index
StringIndex(['a' None 'b' 'c'], dtype='object')
>>> index.dropna()
StringIndex(['a' 'b' 'c'], dtype='object')

Using dropna on a MultiIndex:

>>> midx = cudf.MultiIndex(
...         levels=[[1, None, 4, None], [1, 2, 5]],
...         codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...         names=["x", "y"],
...     )
>>> midx
MultiIndex([(   1, 1),
            (   1, 5),
            (<NA>, 2),
            (   4, 2),
            (<NA>, 1)],
           names=['x', 'y'])
>>> midx.dropna()
MultiIndex([(1, 1),
            (1, 5),
            (4, 2)],
           names=['x', 'y'])
property dtype

dtype of the underlying values in GenericIndex.

property empty

Indicator whether Index is empty.

True if Index is entirely empty (no items).

Returns
outbool

If Index is empty, return True, if not return False.

Examples

>>> import cudf
>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.empty
True
equals(other, **kwargs)

Determine if two Index objects contain the same elements.

Returns
out: bool

True if “other” is an Index and it has the same elements as calling index; False otherwise.

exp()

Get the exponential of all elements, element-wise.

Exponential is the inverse of the log function, so that x.exp().log() = x

Returns
DataFrame/Series/Index

Result of the element-wise exponential.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.exp()
0    3.678794e-01
1    1.000000e+00
2    2.718282e+00
3    1.383117e+00
4    1.648721e+00
5    4.539993e-05
6    2.688117e+43
dtype: float64

exp operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.exp()
      first        second
0  0.367879      1.263644
1  0.000045      1.349859
2  1.648721  22026.465795

exp operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.exp()
Float64Index([0.36787944117144233,  1.4918246976412703,
              2.718281828459045, 1.0,  1.3498588075760032],
            dtype='float64')
factorize(na_sentinel=- 1)

Encode the input values as integer labels

See also

cudf.core.series.Series.factorize

Encode the input values of Series.

fillna(value, downcast=None)

Fill null values with the specified value.

Parameters
valuescalar

Scalar value to use to fill nulls. This value cannot be a list-likes.

downcastdict, default is None

This Parameter is currently NON-FUNCTIONAL.

Returns
filledIndex

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, None, 4])
>>> index
Int64Index([1, 2, null, 4], dtype='int64')
>>> index.fillna(3)
Int64Index([1, 2, 3, 4], dtype='int64')
find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

Returns
begin, end2-tuple of int

The starting index and the ending index. The last value occurs at end - 1 position.

classmethod from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

Parameters
arrayPyArrow Array/ChunkedArray

PyArrow Object which has to be converted to Index

Returns
cudf Index
Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pyarrow as pa
>>> cudf.Index.from_arrow(pa.array(["a", "b", None]))
StringIndex(['a' 'b' None], dtype='object')
classmethod from_pandas(index, nan_as_null=None)

Convert from a Pandas Index.

Parameters
indexPandas Index object

A Pandas Index object which has to be converted to cuDF Index.

nan_as_nullbool, Default None

If None/True, converts np.nan values to null values. If False, leaves np.nan values as is.

Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pandas as pd
>>> import numpy as np
>>> data = [10, 20, 30, np.nan]
>>> pdi = pd.Index(data)
>>> cudf.core.index.Index.from_pandas(pdi)
Float64Index([10.0, 20.0, 30.0, <NA>], dtype='float64')
>>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False)
Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
get_level_values(level)

Return an Index of values for requested level.

This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.

Parameters
levelint or str

It is either the integer position or the name of the level.

Returns
Index

Calling object, as there is only one level in the Index.

See also

cudf.core.multiindex.MultiIndex.get_level_values

Get values for a level of a MultiIndex.

Notes

For Index, level should be 0, since there are no multiple levels.

Examples

>>> import cudf
>>> idx = cudf.Index(["a", "b", "c"])
>>> idx.get_level_values(0)
StringIndex(['a' 'b' 'c'], dtype='object')
get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if side=='right') position of given label.

Parameters
labelobject
side{‘left’, ‘right’}
kind{‘ix’, ‘loc’, ‘getitem’}
Returns
int

Index of label.

property gpu_values

View the data as a numba device array object

interleave_columns()

Interleave Series columns of a table into a single column.

Converts the column major table cols into a row major column.

Parameters
colsinput Table containing columns to interleave.
Returns
The interleaved columns as a single column

Examples

>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']])
>>> df
0    [A1, A2, A3]
1    [B1, B2, B3]
>>> df.interleave_columns()
0    A1
1    B1
2    A2
3    B2
4    A3
5    B3
property is_monotonic

Alias for is_monotonic_increasing.

property is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

property is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

property is_unique

Return if the index has unique values.

isin(values)

Return a boolean array where the index values are in values.

Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.

Parameters
valuesset, list-like, Index

Sought values.

Returns
is_containedcupy array

CuPy array of boolean values.

Examples

>>> idx = cudf.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')

Check whether each index value in a list of values.

>>> idx.isin([1, 4])
array([ True, False, False])
isna()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
isnull()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
join(other, how='left', level=None, return_indexers=False, sort=False)

Compute join_index and indexers to conform data structures to the new index.

Parameters
otherIndex.
how{‘left’, ‘right’, ‘inner’, ‘outer’}
return_indexersbool, default False
sortbool, default False

Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).

Returns: index

Examples

>>> import cudf
>>> lhs = cudf.DataFrame(
...     {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b']
... ).index
>>> lhs
MultiIndex([(2, 3),
            (3, 4),
            (1, 2)],
           names=['a', 'b'])
>>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index
>>> rhs
Int64Index([1, 4, 3], dtype='int64', name='a')
>>> lhs.join(rhs, how='inner')
MultiIndex([(3, 4),
            (1, 2)],
           names=['a', 'b'])
log()

Get the natural logarithm of all elements, element-wise.

Natural logarithm is the inverse of the exp function, so that x.log().exp() = x

Returns
DataFrame/Series/Index

Result of the element-wise natural logarithm.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.log()
0         NaN
1        -inf
2    0.000000
3   -1.125963
4   -0.693147
5         NaN
6    4.605170
dtype: float64

log operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.log()
      first    second
0       NaN -1.452434
1       NaN -1.203973
2 -0.693147  2.302585

log operation on Index:

>>> index = cudf.Index([10, 11, 500.0])
>>> index
Float64Index([10.0, 11.0, 500.0], dtype='float64')
>>> index.log()
Float64Index([2.302585092994046, 2.3978952727983707,
            6.214608098422191], dtype='float64')
mask(cond, other=None, inplace=False)

Replace values where the condition is True.

Parameters
condbool Series/DataFrame, array-like

Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.

other: scalar, list of scalars, Series/DataFrame

Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.

DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.

Series expects only scalar or series like with same length

inplacebool, default False

Whether to perform the operation in place on the data.

Returns
Same type as caller

Examples

>>> import cudf
>>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]})
>>> df.mask(df % 2 == 0, [-1, -1])
   A  B
0  1  3
1 -1  5
2  5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.mask(ser > 2, 10)
0    10
1    10
2     2
3     1
4     0
dtype: int64
>>> ser.mask(ser > 2)
0    <NA>
1    <NA>
2       2
3       1
4       0
dtype: int64
max()

Return the maximum value of the Index.

Returns
scalar

Maximum value.

See also

cudf.core.index.Index.min

Return the minimum value in an Index.

cudf.core.series.Series.max

Return the maximum value in a Series.

cudf.core.dataframe.DataFrame.max

Return the maximum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.max()
3
memory_usage(deep=False)

Memory usage of the values.

Parameters
deepbool

Introspect the data deeply, interrogate object dtypes for system-level memory consumption.

Returns
bytes used
min()

Return the minimum value of the Index.

Returns
scalar

Minimum value.

See also

cudf.core.index.Index.max

Return the maximum value in an Index.

cudf.core.series.Series.min

Return the minimum value in a Series.

cudf.core.dataframe.DataFrame.min

Return the minimum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.min()
1
property name

Returns the name of the Index.

property names

Returns a tuple containing the name of the Index.

property ndim

Dimension of the data. Apart from MultiIndex ndim is always 1.

property nlevels

Number of levels.

notna()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
notnull()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

Parameters
funcfunction

Function to apply to the Series/DataFrame/Index. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame/Index.

argsiterable, optional

Positional arguments passed into func.

kwargsmapping, optional

A dictionary of keyword arguments passed into func.

Returns
objectthe return type of func.

Examples

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. Instead of writing

>>> func(g(h(df), arg1=a), arg2=b, arg3=c)

You can write

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe(func, arg2=b, arg3=c)
... )

If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose f takes its data as arg2:

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe((func, 'arg2'), arg1=a, arg3=c)
...  )
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)

Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.

Parameters
axis{0 or ‘index’, 1 or ‘columns’}, default 0

Index to direct ranking.

method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’

How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.

numeric_onlybool, optional

For DataFrame objects, rank only numeric columns if set to True.

na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’

How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.

ascendingbool, default True

Whether or not the elements should be ranked in ascending order.

pctbool, default False

Whether or not to display the returned rankings in percentile form.

Returns
same type as caller

Return a Series or DataFrame with data ranks as values.

rename(name, inplace=False)

Alter Index name.

Defaults to returning new index.

Parameters
namelabel

Name(s) to set.

Returns
Index

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3], name='one')
>>> index
Int64Index([1, 2, 3], dtype='int64', name='one')
>>> index.name
'one'
>>> renamed_index = index.rename('two')
>>> renamed_index
Int64Index([1, 2, 3], dtype='int64', name='two')
>>> renamed_index.name
'two'
repeat(repeats, axis=None)

Repeats elements consecutively.

Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.

Parameters
repeatsint, or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.

Returns
Series/DataFrame/Index

A newly created object of same type as caller with repeated elements.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]})
>>> df
   a   b
0  1  10
1  2  20
2  3  30
>>> df.repeat(3)
   a   b
0  1  10
0  1  10
0  1  10
1  2  20
1  2  20
1  2  20
2  3  30
2  3  30
2  3  30

Repeat on Series

>>> s = cudf.Series([0, 2])
>>> s
0    0
1    2
dtype: int64
>>> s.repeat([3, 4])
0    0
0    0
0    0
1    2
1    2
1    2
1    2
dtype: int64
>>> s.repeat(2)
0    0
0    0
1    2
1    2
dtype: int64

Repeat on Index

>>> index = cudf.Index([10, 22, 33, 55])
>>> index
Int64Index([10, 22, 33, 55], dtype='int64')
>>> index.repeat(5)
Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33,
            33, 33, 33, 33, 55, 55, 55, 55, 55],
        dtype='int64')
round(decimals=0)

Round a DataFrame to a variable number of decimal places.

Parameters
decimalsint, dict, Series

Number of decimal places to round each column to. If an int is given, round each column to the same number of places. Otherwise dict and Series round to variable numbers of places. Column names should be in the keys if decimals is a dict-like, or in the index if decimals is a Series. Any columns not included in decimals will be left as is. Elements of decimals which are not columns of the input will be ignored.

Returns
DataFrame

A DataFrame with the affected columns rounded to the specified number of decimal places.

Examples

>>> df = cudf.DataFrame(
        [(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
...     columns=['dogs', 'cats']
... )
>>> df
    dogs  cats
0  0.21  0.32
1  0.01  0.67
2  0.66  0.03
3  0.21  0.18

By providing an integer each column is rounded to the same number of decimal places

>>> df.round(1)
    dogs  cats
0   0.2   0.3
1   0.0   0.7
2   0.7   0.0
3   0.2   0.2

With a dict, the number of places for specific columns can be specified with the column names as key and the number of decimal places as value

>>> df.round({'dogs': 1, 'cats': 0})
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

Using a Series, the number of places for specific columns can be specified with the column names as index and the number of decimal places as value

>>> decimals = cudf.Series([0, 1], index=['cats', 'dogs'])
>>> df.round(decimals)
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters
nint, optional

Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

fracfloat, optional

Fraction of axis items to return. Cannot be used with n.

replacebool, default False

Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”

weightsstr or ndarray-like, optional

Only supported for axis=1/”columns”

random_stateint, numpy RandomState or None, default None

Seed for the random number generator (if int), or None. If None, a random seed will be chosen. if RandomState, seed will be extracted from current state.

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.

Returns
Series or DataFrame or Index

A new object of same type as caller containing n items randomly sampled from the caller object.

Examples

>>> import cudf as cudf
>>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}})
>>> df.sample(3)
   a
1  2
3  4
0  1
>>> sr = cudf.Series([1, 2, 3, 4, 5])
>>> sr.sample(10, replace=True)
1    4
3    1
2    4
0    5
0    1
4    5
4    1
0    2
0    3
3    2
dtype: int64
>>> df = cudf.DataFrame(
... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]})
>>> df.sample(2, axis=1)
   a  c
0  1  3
1  2  4
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)

Scatter to a list of dataframes.

Uses map_index to determine the destination of each row of the original DataFrame.

Parameters
map_indexSeries, str or list-like

Scatter assignment for each row

map_sizeint

Length of output list. Must be >= uniques in map_index

keep_indexbool

Conserve original index values for each row

Returns
A list of cudf.DataFrame objects.
searchsorted(values, side='left', ascending=True, na_position='last')

Find indices where elements should be inserted to maintain order

Parameters
valueFrame (Shape must be consistent with self)

Values to be hypothetically inserted into Self

sidestr {‘left’, ‘right’} optional, default ‘left‘

If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index

ascendingbool optional, default True

Sorted Frame is in ascending order (otherwise descending)

na_positionstr {‘last’, ‘first’} optional, default ‘last‘

Position of null values in sorted order

Returns
1-D cupy array of insertion points

Examples

>>> s = cudf.Series([1, 2, 3])
>>> s.searchsorted(4)
3
>>> s.searchsorted([0, 4])
array([0, 3], dtype=int32)
>>> s.searchsorted([1, 3], side='left')
array([0, 2], dtype=int32)
>>> s.searchsorted([1, 3], side='right')
array([1, 3], dtype=int32)

If the values are not monotonically sorted, wrong locations may be returned:

>>> s = cudf.Series([2, 1, 3])
>>> s.searchsorted(1)
0   # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]})
>>> df
   a   b
0  1  10
1  3  12
2  5  14
3  7  16
>>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6],
... 'b': [10, 11, 13, 15]})
>>> values_df
   a   b
0  0  10
1  2  17
2  5  13
3  6  15
>>> df.searchsorted(values_df, ascending=False)
array([4, 4, 4, 0], dtype=int32)
set_names(names, level=None, inplace=False)

Set Index or MultiIndex name. Able to set new names partially and by level.

Parameters
nameslabel or list of label

Name(s) to set.

levelint, label or list of int or label, optional

If the index is a MultiIndex, level(s) to set (None for all levels). Otherwise level must be None.

inplacebool, default False

Modifies the object directly, instead of creating a new Index or MultiIndex.

Returns
Index

The same type as the caller or None if inplace is True.

See also

cudf.core.index.Index.rename

Able to set new names without level.

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = cudf.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
            ('python', 2019),
            ( 'cobra', 2018),
            ( 'cobra', 2019)],
           )
>>> idx.names
FrozenList([None, None])
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx.names
FrozenList(['kind', 'year'])
>>> idx.set_names('species', level=0, inplace=True)
>>> idx.names
FrozenList(['species', 'year'])
property shape

Returns a tuple representing the dimensionality of the Index.

shift(periods=1, freq=None, axis=0, fill_value=None)

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.sin()
0    0.000000
1    0.318683
2    0.479426
3    0.850904
4    0.893997
5   -0.801153
6    0.958916
dtype: float64

sin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.sin()
      first    second
0  0.000000 -0.506366
1 -0.958924  0.958916
2 -0.544021 -0.544072
3  0.650288 -0.999756

sin operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.sin()
Float64Index([-0.3894183423086505, -0.5063656411097588,
            0.8011526357338306, 0.8939966636005579],
            dtype='float64')
property size

Return the number of elements in the underlying data.

Returns
sizeSize of the DataFrame / Index / Series / MultiIndex

Examples

Size of an empty dataframe is 0.

>>> import cudf
>>> df = cudf.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>> df.size
0
>>> df = cudf.DataFrame(index=[1, 2, 3])
>>> df
Empty DataFrame
Columns: []
Index: [1, 2, 3]
>>> df.size
0

DataFrame with values

>>> df = cudf.DataFrame({'a': [10, 11, 12],
...         'b': ['hello', 'rapids', 'ai']})
>>> df
    a       b
0  10   hello
1  11  rapids
2  12      ai
>>> df.size
6
>>> df.index
RangeIndex(start=0, stop=3)
>>> df.index.size
3

Size of an Index

>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.size
0
>>> index = cudf.Index([1, 2, 3, 10])
>>> index
Int64Index([1, 2, 3, 10], dtype='int64')
>>> index.size
4

Size of a MultiIndex

>>> midx = cudf.MultiIndex(
...                 levels=[["a", "b", "c", None], ["1", None, "5"]],
...                 codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...                 names=["x", "y"],
...             )
>>> midx
MultiIndex([( 'a',  '1'),
            ( 'a',  '5'),
            ( 'b', <NA>),
            ( 'c', <NA>),
            (<NA>,  '1')],
           names=['x', 'y'])
>>> midx.size
5
sort_values(return_indexer=False, ascending=True, key=None)

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

Parameters
return_indexerbool, default False

Should the indices that would sort the index be returned.

ascendingbool, default True

Should the index values be sorted in an ascending order.

keyNone, optional

This parameter is NON-FUNCTIONAL.

Returns
sorted_indexIndex

Sorted copy of the index.

indexercupy.ndarray, optional

The indices that the index itself was sorted by.

See also

cudf.core.series.Series.min

Sort values of a Series.

cudf.core.dataframe.DataFrame.sort_values

Sort values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')

Sort values in ascending order (default behavior).

>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')

Sort values in descending order, and also get the indices idx was sorted by.

>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2],
                                                    dtype=int32))

Sorting values in a MultiIndex:

>>> midx = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> midx
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> midx.sort_values()
MultiIndex([(-10,  1),
            (  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11)],
           names=['x', 'y'])
>>> midx.sort_values(ascending=False)
MultiIndex([(  4, 11),
            (  3, 11),
            (  1,  5),
            (  1,  1),
            (-10,  1)],
           names=['x', 'y'])
sqrt()

Get the non-negative square-root of all elements, element-wise.

Returns
DataFrame/Series/Index

Result of the non-negative square-root of each element.

Examples

>>> import cudf
>>> import cudf
>>> ser = cudf.Series([10, 25, 81, 1.0, 100])
>>> ser
0     10.0
1     25.0
2     81.0
3      1.0
4    100.0
dtype: float64
>>> ser.sqrt()
0     3.162278
1     5.000000
2     9.000000
3     1.000000
4    10.000000
dtype: float64

sqrt operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-10.0, 100, 625],
...                      'second': [1, 2, 0.4]})
>>> df
   first  second
0  -10.0     1.0
1  100.0     2.0
2  625.0     0.4
>>> df.sqrt()
   first    second
0    NaN  1.000000
1   10.0  1.414214
2   25.0  0.632456

sqrt operation on Index:

>>> index = cudf.Index([-10.0, 100, 625])
>>> index
Float64Index([-10.0, 100.0, 625.0], dtype='float64')
>>> index.sqrt()
Float64Index([nan, 10.0, 25.0], dtype='float64')
sum()

Return the sum of all values of the Index.

Returns
scalar

Sum of all values.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.sum()
6
take(indices)

Gather only the specific subset of indices

Parameters
indices: An array-like that maps to values contained in this Index.
tan()

Get Trigonometric tangent, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.tan()
0    0.000000
1    0.336213
2    0.546302
3    1.619775
4   -1.995200
5    1.338690
6   -3.380140
dtype: float64

tan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.tan()
      first     second
0  0.000000  -0.587214
1 -3.380515  -3.380140
2  0.648361   0.648446
3 -0.855993  45.244742

tan operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.tan()
Float64Index([-0.4227932187381618,  -0.587213915156929,
            -1.3386902103511544, -1.995200412208242],
            dtype='float64')
tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

Parameters
selfinput Table containing columns to interleave.
countNumber of times to tile “rows”. Must be non-negative.
Returns
The table containing the tiled “rows”.

Examples

>>> df  = Dataframe([[8, 4, 7], [5, 2, 3]])
>>> count = 2
>>> df.tile(df, count)
   0  1  2
0  8  4  7
1  5  2  3
0  8  4  7
1  5  2  3
to_array(fillna=None)

Get a dense numpy array for the data.

Parameters
fillnastr or None

Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.

Notes

if fillna is None, null values are skipped. Therefore, the output size could be smaller.

to_arrow()

Convert Index to PyArrow Array

Returns
PyArrow Array

Examples

>>> import cudf
>>> ind = cudf.Index(["a", "b", None])
>>> ind.to_arrow()
<pyarrow.lib.StringArray object at 0x7f796b0e7750>
[
  "a",
  "b",
  null
]
to_dlpack()

Converts a cuDF object into a DLPack tensor.

DLPack is an open-source memory tensor structure: dmlc/dlpack.

This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.

Parameters
cudf_objDataFrame, Series, Index, or Column
Returns
pycapsule_objPyCapsule

Output DLPack tensor pointer which is encapsulated in a PyCapsule object.

to_frame(index=True, name=None)

Create a DataFrame with a column containing this Index

Parameters
indexboolean, default True

Set the index of the returned DataFrame as the original Index

namestr, default None

Name to be used for the column

Returns
DataFrame

cudf DataFrame

to_pandas()

Convert to a Pandas Index.

Examples

>>> import cudf
>>> idx = cudf.Index([-3, 10, 15, 20])
>>> idx
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> idx.to_pandas()
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> type(idx.to_pandas())
<class 'pandas.core.indexes.numeric.Int64Index'>
>>> type(idx)
<class 'cudf.core.index.GenericIndex'>
to_series(index=None, name=None)

Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.

Parameters
indexIndex, optional

Index of resulting Series. If None, defaults to original index.

namestr, optional

Dame of resulting Series. If None, defaults to name of original index.

Returns
Series

The dtype will be based on the type of the Index values.

unique()

Return unique values in the index.

Returns
Index without duplicates
property values

Return an array representing the data in the Index.

Returns
arrayA cupy array of data in the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values
array([  1, -10, 100,  20])
>>> type(index.values)
<class 'cupy.core.core.ndarray'>
property values_host

Return a numpy representation of the Index.

Only the values in the Index will be returned.

Returns
outnumpy.ndarray

The values of the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values_host
array([  1, -10, 100,  20])
>>> type(index.values_host)
<class 'numpy.ndarray'>
where(cond, other=None)

Replace values where the condition is False.

Parameters
condbool array-like with the same length as self

Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.

other: scalar, or array-like

Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.

Returns
Same type as caller

Examples

>>> import cudf
>>> index = cudf.Index([4, 3, 2, 1, 0])
>>> index
Int64Index([4, 3, 2, 1, 0], dtype='int64')
>>> index.where(index > 2, 15)
Int64Index([4, 3, 15, 15, 15], dtype='int64')

Int16Index

class cudf.core.index.Int16Index(data=None, dtype=None, copy=False, name=None)

Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.

Parameters
dataarray-like (1-dimensional)/ DataFrame

If it is a DataFrame, it will return a MultiIndex

dtypeNumPy dtype (default: object)

If dtype is None, we find the dtype that best fits the data.

copybool

Make a copy of input data.

nameobject

Name to be stored in the index.

tupleize_colsbool (default: True)

When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.

Returns
Index

cudf Index

Examples

>>> import cudf
>>> cudf.Index([1, 2, 3], dtype="uint64", name="a")
UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]}))
MultiIndex([(1, 2),
            (2, 3)],
          names=['a', 'b'])
Attributes
dtype

dtype of the underlying values in GenericIndex.

empty

Indicator whether Index is empty.

gpu_values

View the data as a numba device array object

is_monotonic

Alias for is_monotonic_increasing.

is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

is_unique

Return if the index has unique values.

name

Returns the name of the Index.

names

Returns a tuple containing the name of the Index.

ndim

Dimension of the data.

nlevels

Number of levels.

shape

Returns a tuple representing the dimensionality of the Index.

size

Return the number of elements in the underlying data.

values

Return an array representing the data in the Index.

values_host

Return a numpy representation of the Index.

Methods

acos()

Get Trigonometric inverse cosine, element-wise.

any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

argsort([ascending])

Return the integer indices that would sort the index.

asin()

Get Trigonometric inverse sine, element-wise.

astype(dtype[, copy])

Create an Index with values cast to dtypes.

atan()

Get Trigonometric inverse tangent, element-wise.

clip([lower, upper, inplace, axis])

Trim values at input threshold(s).

copy([name, deep, dtype, names])

Make a copy of this object.

cos()

Get Trigonometric cosine, element-wise.

difference(other[, sort])

Return a new Index with elements from the index that are not in other.

drop_duplicates([keep])

Return Index with duplicate values removed

dropna([how])

Return an Index with null values removed.

equals(other, **kwargs)

Determine if two Index objects contain the same elements.

exp()

Get the exponential of all elements, element-wise.

factorize([na_sentinel])

Encode the input values as integer labels

fillna(value[, downcast])

Fill null values with the specified value.

find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

from_pandas(index[, nan_as_null])

Convert from a Pandas Index.

get_level_values(level)

Return an Index of values for requested level.

get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label.

interleave_columns()

Interleave Series columns of a table into a single column.

isin(values)

Return a boolean array where the index values are in values.

isna()

Identify missing values.

isnull()

Identify missing values.

join(other[, how, level, return_indexers, sort])

Compute join_index and indexers to conform data structures to the new index.

log()

Get the natural logarithm of all elements, element-wise.

mask(cond[, other, inplace])

Replace values where the condition is True.

max()

Return the maximum value of the Index.

memory_usage([deep])

Memory usage of the values.

min()

Return the minimum value of the Index.

notna()

Identify non-missing values.

notnull()

Identify non-missing values.

pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

rank([axis, method, numeric_only, …])

Compute numerical data ranks (1 through n) along axis.

rename(name[, inplace])

Alter Index name.

repeat(repeats[, axis])

Repeats elements consecutively.

round([decimals])

Round a DataFrame to a variable number of decimal places.

sample([n, frac, replace, weights, …])

Return a random sample of items from an axis of object.

scatter_by_map(map_index[, map_size, keep_index])

Scatter to a list of dataframes.

searchsorted(values[, side, ascending, …])

Find indices where elements should be inserted to maintain order

set_names(names[, level, inplace])

Set Index or MultiIndex name.

shift([periods, freq, axis, fill_value])

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

sort_values([return_indexer, ascending, key])

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

sqrt()

Get the non-negative square-root of all elements, element-wise.

sum()

Return the sum of all values of the Index.

take(indices)

Gather only the specific subset of indices

tan()

Get Trigonometric tangent, element-wise.

tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

to_array([fillna])

Get a dense numpy array for the data.

to_arrow()

Convert Index to PyArrow Array

to_dlpack()

Converts a cuDF object into a DLPack tensor.

to_frame([index, name])

Create a DataFrame with a column containing this Index

to_pandas()

Convert to a Pandas Index.

to_series([index, name])

Create a Series with both index and values equal to the index keys.

unique()

Return unique values in the index.

where(cond[, other])

Replace values where the condition is False.

replace

acos()

Get Trigonometric inverse cosine, element-wise.

The inverse of cos so that, if y = x.cos(), then x = y.acos()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.acos()
0    3.141593
1    1.570796
2    0.000000
3    1.240482
4    1.047198
dtype: float64

acos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.acos()
      first    second
0  3.141593  1.334606
1  1.570796  1.266104
2  1.047198  1.470629

acos operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.acos()
Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0,
            1.5707963267948966,  1.266103672779499],
            dtype='float64')
any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

Parameters
otherIndex or list/tuple of indices
Returns
appendedIndex

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 10, 100])
>>> idx
Int64Index([1, 2, 10, 100], dtype='int64')
>>> other = cudf.Index([200, 400, 50])
>>> other
Int64Index([200, 400, 50], dtype='int64')
>>> idx.append(other)
Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')

append accepts list of Index objects

>>> idx.append([other, other])
Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
argsort(ascending=True, **kwargs)

Return the integer indices that would sort the index.

Parameters
ascendingbool, default True

If True, returns the indices for ascending order. If False, returns the indices for descending order.

Returns
arrayA cupy array containing Integer indices that

would sort the index if used as an indexer.

Examples

>>> import cudf
>>> index = cudf.Index([10, 100, 1, 1000])
>>> index
Int64Index([10, 100, 1, 1000], dtype='int64')
>>> index.argsort()
array([2, 0, 1, 3], dtype=int32)

The order of argsort can be reversed using ascending parameter, by setting it to False. >>> index.argsort(ascending=False) array([3, 1, 0, 2], dtype=int32)

argsort on a MultiIndex:

>>> index = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> index
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> index.argsort()
array([4, 0, 1, 2, 3], dtype=int32)
>>> index.argsort(ascending=False)
array([3, 2, 1, 0, 4], dtype=int32)
asin()

Get Trigonometric inverse sine, element-wise.

The inverse of sine so that, if y = x.sin(), then x = y.asin()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.asin()
0   -1.570796
1    0.000000
2    1.570796
3    0.330314
4    0.523599
dtype: float64

asin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.asin()
      first    second
0 -1.570796  0.236190
1  0.000000  0.304693
2  0.523599  0.100167

asin operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64')
>>> index.asin()
Float64Index([-1.5707963267948966, 0.41151684606748806,
            1.5707963267948966, 0.3046926540153975],
            dtype='float64')
astype(dtype, copy=False)

Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.

Parameters
dtypenumpy dtype

Use a numpy.dtype to cast entire Index object to.

copybool, default False

By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.

Returns
Index

Index with values cast to specified dtype.

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3])
>>> index
Int64Index([1, 2, 3], dtype='int64')
>>> index.astype('float64')
Float64Index([1.0, 2.0, 3.0], dtype='float64')
atan()

Get Trigonometric inverse tangent, element-wise.

The inverse of tan so that, if y = x.tan(), then x = y.atan()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10])
>>> ser
0    -1.00000
1     0.00000
2     1.00000
3     0.32434
4     0.50000
5   -10.00000
dtype: float64
>>> ser.atan()
0   -0.785398
1    0.000000
2    0.785398
3    0.313635
4    0.463648
5   -1.471128
dtype: float64

atan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.atan()
      first    second
0 -0.785398  0.229864
1 -1.471128  0.291457
2  0.463648  1.471128

atan operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.atan()
Float64Index([-0.7853981633974483,  0.3805063771123649,
                            0.7853981633974483, 0.0,
                            0.2914567944778671],
            dtype='float64')
clip(lower=None, upper=None, inplace=False, axis=1)

Trim values at input threshold(s).

Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.

Parameters
lowerscalar or array_like, default None

Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.

upperscalar or array_like, default None

Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.

inplacebool, default False
Returns
Clipped DataFrame/Series/Index/MultiIndex

Examples

>>> import cudf
>>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']})
>>> df.clip(lower=[2, 'b'], upper=[3, 'c'])
   a  b
0  2  b
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=None, upper=[3, 'c'])
   a  b
0  1  a
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=[2, 'b'], upper=None)
   a  b
0  2  b
1  2  b
2  3  c
3  4  d
>>> df.clip(lower=2, upper=3, inplace=True)
>>> df
   a  b
0  2  2
1  2  3
2  3  3
3  3  3
>>> import cudf
>>> sr = cudf.Series([1, 2, 3, 4])
>>> sr.clip(lower=2, upper=3)
0    2
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=None, upper=3)
0    1
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True)
>>> sr
0    2
1    2
2    3
3    4
dtype: int64
copy(name=None, deep=False, dtype=None, names=None)

Make a copy of this object.

Parameters
nameobject, default None

Name of index, use original name when None

deepbool, default True

Make a deep copy of the data. With deep=False the original data is used

dtypenumpy dtype, default None

Target datatype to cast into, use original dtype when None

nameslist-like, default False

Kept compatibility with MultiIndex. Should not be used.

Returns
New index instance, casted to new dtype
cos()

Get Trigonometric cosine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.cos()
0    1.000000
1    0.947861
2    0.877583
3    0.525322
4   -0.448074
5   -0.598460
6   -0.283691
dtype: float64

cos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.cos()
      first    second
0  1.000000  0.862319
1  0.283662 -0.283691
2 -0.839072 -0.839039
3 -0.759688 -0.022097

cos operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.cos()
Float64Index([ 0.9210609940028851,  0.8623188722876839,
            -0.5984600690578581, -0.4480736161291701],
            dtype='float64')
difference(other, sort=None)

Return a new Index with elements from the index that are not in other.

This is the set difference of two Index objects.

Parameters
otherIndex or array-like
sortFalse or None, default None

Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.

  • None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.

  • False : Do not sort the result.

Returns
differenceIndex

Examples

>>> import cudf
>>> idx1 = cudf.Index([2, 1, 3, 4])
>>> idx1
Int64Index([2, 1, 3, 4], dtype='int64')
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx2
Int64Index([3, 4, 5, 6], dtype='int64')
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
drop_duplicates(keep='first')

Return Index with duplicate values removed

Parameters
keep{‘first’, ‘last’, False}, default ‘first’
  • ‘first’Drop duplicates except for the

    first occurrence.

  • ‘last’Drop duplicates except for the

    last occurrence.

  • False : Drop all duplicates.

Returns
deduplicatedIndex

Examples

>>> import cudf
>>> idx = cudf.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
>>> idx
StringIndex(['lama' 'cow' 'lama' 'beetle' 'lama' 'hippo'], dtype='object')
>>> idx.drop_duplicates()
StringIndex(['beetle' 'cow' 'hippo' 'lama'], dtype='object')
dropna(how='any')

Return an Index with null values removed.

Parameters
how{‘any’, ‘all’}, default ‘any’

If the Index is a MultiIndex, drop the value when any or all levels are NaN.

Returns
validIndex

Examples

>>> import cudf
>>> index = cudf.Index(['a', None, 'b', 'c'])
>>> index
StringIndex(['a' None 'b' 'c'], dtype='object')
>>> index.dropna()
StringIndex(['a' 'b' 'c'], dtype='object')

Using dropna on a MultiIndex:

>>> midx = cudf.MultiIndex(
...         levels=[[1, None, 4, None], [1, 2, 5]],
...         codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...         names=["x", "y"],
...     )
>>> midx
MultiIndex([(   1, 1),
            (   1, 5),
            (<NA>, 2),
            (   4, 2),
            (<NA>, 1)],
           names=['x', 'y'])
>>> midx.dropna()
MultiIndex([(1, 1),
            (1, 5),
            (4, 2)],
           names=['x', 'y'])
property dtype

dtype of the underlying values in GenericIndex.

property empty

Indicator whether Index is empty.

True if Index is entirely empty (no items).

Returns
outbool

If Index is empty, return True, if not return False.

Examples

>>> import cudf
>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.empty
True
equals(other, **kwargs)

Determine if two Index objects contain the same elements.

Returns
out: bool

True if “other” is an Index and it has the same elements as calling index; False otherwise.

exp()

Get the exponential of all elements, element-wise.

Exponential is the inverse of the log function, so that x.exp().log() = x

Returns
DataFrame/Series/Index

Result of the element-wise exponential.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.exp()
0    3.678794e-01
1    1.000000e+00
2    2.718282e+00
3    1.383117e+00
4    1.648721e+00
5    4.539993e-05
6    2.688117e+43
dtype: float64

exp operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.exp()
      first        second
0  0.367879      1.263644
1  0.000045      1.349859
2  1.648721  22026.465795

exp operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.exp()
Float64Index([0.36787944117144233,  1.4918246976412703,
              2.718281828459045, 1.0,  1.3498588075760032],
            dtype='float64')
factorize(na_sentinel=- 1)

Encode the input values as integer labels

See also

cudf.core.series.Series.factorize

Encode the input values of Series.

fillna(value, downcast=None)

Fill null values with the specified value.

Parameters
valuescalar

Scalar value to use to fill nulls. This value cannot be a list-likes.

downcastdict, default is None

This Parameter is currently NON-FUNCTIONAL.

Returns
filledIndex

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, None, 4])
>>> index
Int64Index([1, 2, null, 4], dtype='int64')
>>> index.fillna(3)
Int64Index([1, 2, 3, 4], dtype='int64')
find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

Returns
begin, end2-tuple of int

The starting index and the ending index. The last value occurs at end - 1 position.

classmethod from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

Parameters
arrayPyArrow Array/ChunkedArray

PyArrow Object which has to be converted to Index

Returns
cudf Index
Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pyarrow as pa
>>> cudf.Index.from_arrow(pa.array(["a", "b", None]))
StringIndex(['a' 'b' None], dtype='object')
classmethod from_pandas(index, nan_as_null=None)

Convert from a Pandas Index.

Parameters
indexPandas Index object

A Pandas Index object which has to be converted to cuDF Index.

nan_as_nullbool, Default None

If None/True, converts np.nan values to null values. If False, leaves np.nan values as is.

Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pandas as pd
>>> import numpy as np
>>> data = [10, 20, 30, np.nan]
>>> pdi = pd.Index(data)
>>> cudf.core.index.Index.from_pandas(pdi)
Float64Index([10.0, 20.0, 30.0, <NA>], dtype='float64')
>>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False)
Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
get_level_values(level)

Return an Index of values for requested level.

This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.

Parameters
levelint or str

It is either the integer position or the name of the level.

Returns
Index

Calling object, as there is only one level in the Index.

See also

cudf.core.multiindex.MultiIndex.get_level_values

Get values for a level of a MultiIndex.

Notes

For Index, level should be 0, since there are no multiple levels.

Examples

>>> import cudf
>>> idx = cudf.Index(["a", "b", "c"])
>>> idx.get_level_values(0)
StringIndex(['a' 'b' 'c'], dtype='object')
get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if side=='right') position of given label.

Parameters
labelobject
side{‘left’, ‘right’}
kind{‘ix’, ‘loc’, ‘getitem’}
Returns
int

Index of label.

property gpu_values

View the data as a numba device array object

interleave_columns()

Interleave Series columns of a table into a single column.

Converts the column major table cols into a row major column.

Parameters
colsinput Table containing columns to interleave.
Returns
The interleaved columns as a single column

Examples

>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']])
>>> df
0    [A1, A2, A3]
1    [B1, B2, B3]
>>> df.interleave_columns()
0    A1
1    B1
2    A2
3    B2
4    A3
5    B3
property is_monotonic

Alias for is_monotonic_increasing.

property is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

property is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

property is_unique

Return if the index has unique values.

isin(values)

Return a boolean array where the index values are in values.

Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.

Parameters
valuesset, list-like, Index

Sought values.

Returns
is_containedcupy array

CuPy array of boolean values.

Examples

>>> idx = cudf.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')

Check whether each index value in a list of values.

>>> idx.isin([1, 4])
array([ True, False, False])
isna()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
isnull()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
join(other, how='left', level=None, return_indexers=False, sort=False)

Compute join_index and indexers to conform data structures to the new index.

Parameters
otherIndex.
how{‘left’, ‘right’, ‘inner’, ‘outer’}
return_indexersbool, default False
sortbool, default False

Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).

Returns: index

Examples

>>> import cudf
>>> lhs = cudf.DataFrame(
...     {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b']
... ).index
>>> lhs
MultiIndex([(2, 3),
            (3, 4),
            (1, 2)],
           names=['a', 'b'])
>>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index
>>> rhs
Int64Index([1, 4, 3], dtype='int64', name='a')
>>> lhs.join(rhs, how='inner')
MultiIndex([(3, 4),
            (1, 2)],
           names=['a', 'b'])
log()

Get the natural logarithm of all elements, element-wise.

Natural logarithm is the inverse of the exp function, so that x.log().exp() = x

Returns
DataFrame/Series/Index

Result of the element-wise natural logarithm.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.log()
0         NaN
1        -inf
2    0.000000
3   -1.125963
4   -0.693147
5         NaN
6    4.605170
dtype: float64

log operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.log()
      first    second
0       NaN -1.452434
1       NaN -1.203973
2 -0.693147  2.302585

log operation on Index:

>>> index = cudf.Index([10, 11, 500.0])
>>> index
Float64Index([10.0, 11.0, 500.0], dtype='float64')
>>> index.log()
Float64Index([2.302585092994046, 2.3978952727983707,
            6.214608098422191], dtype='float64')
mask(cond, other=None, inplace=False)

Replace values where the condition is True.

Parameters
condbool Series/DataFrame, array-like

Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.

other: scalar, list of scalars, Series/DataFrame

Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.

DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.

Series expects only scalar or series like with same length

inplacebool, default False

Whether to perform the operation in place on the data.

Returns
Same type as caller

Examples

>>> import cudf
>>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]})
>>> df.mask(df % 2 == 0, [-1, -1])
   A  B
0  1  3
1 -1  5
2  5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.mask(ser > 2, 10)
0    10
1    10
2     2
3     1
4     0
dtype: int64
>>> ser.mask(ser > 2)
0    <NA>
1    <NA>
2       2
3       1
4       0
dtype: int64
max()

Return the maximum value of the Index.

Returns
scalar

Maximum value.

See also

cudf.core.index.Index.min

Return the minimum value in an Index.

cudf.core.series.Series.max

Return the maximum value in a Series.

cudf.core.dataframe.DataFrame.max

Return the maximum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.max()
3
memory_usage(deep=False)

Memory usage of the values.

Parameters
deepbool

Introspect the data deeply, interrogate object dtypes for system-level memory consumption.

Returns
bytes used
min()

Return the minimum value of the Index.

Returns
scalar

Minimum value.

See also

cudf.core.index.Index.max

Return the maximum value in an Index.

cudf.core.series.Series.min

Return the minimum value in a Series.

cudf.core.dataframe.DataFrame.min

Return the minimum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.min()
1
property name

Returns the name of the Index.

property names

Returns a tuple containing the name of the Index.

property ndim

Dimension of the data. Apart from MultiIndex ndim is always 1.

property nlevels

Number of levels.

notna()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
notnull()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

Parameters
funcfunction

Function to apply to the Series/DataFrame/Index. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame/Index.

argsiterable, optional

Positional arguments passed into func.

kwargsmapping, optional

A dictionary of keyword arguments passed into func.

Returns
objectthe return type of func.

Examples

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. Instead of writing

>>> func(g(h(df), arg1=a), arg2=b, arg3=c)

You can write

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe(func, arg2=b, arg3=c)
... )

If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose f takes its data as arg2:

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe((func, 'arg2'), arg1=a, arg3=c)
...  )
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)

Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.

Parameters
axis{0 or ‘index’, 1 or ‘columns’}, default 0

Index to direct ranking.

method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’

How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.

numeric_onlybool, optional

For DataFrame objects, rank only numeric columns if set to True.

na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’

How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.

ascendingbool, default True

Whether or not the elements should be ranked in ascending order.

pctbool, default False

Whether or not to display the returned rankings in percentile form.

Returns
same type as caller

Return a Series or DataFrame with data ranks as values.

rename(name, inplace=False)

Alter Index name.

Defaults to returning new index.

Parameters
namelabel

Name(s) to set.

Returns
Index

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3], name='one')
>>> index
Int64Index([1, 2, 3], dtype='int64', name='one')
>>> index.name
'one'
>>> renamed_index = index.rename('two')
>>> renamed_index
Int64Index([1, 2, 3], dtype='int64', name='two')
>>> renamed_index.name
'two'
repeat(repeats, axis=None)

Repeats elements consecutively.

Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.

Parameters
repeatsint, or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.

Returns
Series/DataFrame/Index

A newly created object of same type as caller with repeated elements.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]})
>>> df
   a   b
0  1  10
1  2  20
2  3  30
>>> df.repeat(3)
   a   b
0  1  10
0  1  10
0  1  10
1  2  20
1  2  20
1  2  20
2  3  30
2  3  30
2  3  30

Repeat on Series

>>> s = cudf.Series([0, 2])
>>> s
0    0
1    2
dtype: int64
>>> s.repeat([3, 4])
0    0
0    0
0    0
1    2
1    2
1    2
1    2
dtype: int64
>>> s.repeat(2)
0    0
0    0
1    2
1    2
dtype: int64

Repeat on Index

>>> index = cudf.Index([10, 22, 33, 55])
>>> index
Int64Index([10, 22, 33, 55], dtype='int64')
>>> index.repeat(5)
Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33,
            33, 33, 33, 33, 55, 55, 55, 55, 55],
        dtype='int64')
round(decimals=0)

Round a DataFrame to a variable number of decimal places.

Parameters
decimalsint, dict, Series

Number of decimal places to round each column to. If an int is given, round each column to the same number of places. Otherwise dict and Series round to variable numbers of places. Column names should be in the keys if decimals is a dict-like, or in the index if decimals is a Series. Any columns not included in decimals will be left as is. Elements of decimals which are not columns of the input will be ignored.

Returns
DataFrame

A DataFrame with the affected columns rounded to the specified number of decimal places.

Examples

>>> df = cudf.DataFrame(
        [(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
...     columns=['dogs', 'cats']
... )
>>> df
    dogs  cats
0  0.21  0.32
1  0.01  0.67
2  0.66  0.03
3  0.21  0.18

By providing an integer each column is rounded to the same number of decimal places

>>> df.round(1)
    dogs  cats
0   0.2   0.3
1   0.0   0.7
2   0.7   0.0
3   0.2   0.2

With a dict, the number of places for specific columns can be specified with the column names as key and the number of decimal places as value

>>> df.round({'dogs': 1, 'cats': 0})
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

Using a Series, the number of places for specific columns can be specified with the column names as index and the number of decimal places as value

>>> decimals = cudf.Series([0, 1], index=['cats', 'dogs'])
>>> df.round(decimals)
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters
nint, optional

Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

fracfloat, optional

Fraction of axis items to return. Cannot be used with n.

replacebool, default False

Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”

weightsstr or ndarray-like, optional

Only supported for axis=1/”columns”

random_stateint, numpy RandomState or None, default None

Seed for the random number generator (if int), or None. If None, a random seed will be chosen. if RandomState, seed will be extracted from current state.

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.

Returns
Series or DataFrame or Index

A new object of same type as caller containing n items randomly sampled from the caller object.

Examples

>>> import cudf as cudf
>>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}})
>>> df.sample(3)
   a
1  2
3  4
0  1
>>> sr = cudf.Series([1, 2, 3, 4, 5])
>>> sr.sample(10, replace=True)
1    4
3    1
2    4
0    5
0    1
4    5
4    1
0    2
0    3
3    2
dtype: int64
>>> df = cudf.DataFrame(
... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]})
>>> df.sample(2, axis=1)
   a  c
0  1  3
1  2  4
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)

Scatter to a list of dataframes.

Uses map_index to determine the destination of each row of the original DataFrame.

Parameters
map_indexSeries, str or list-like

Scatter assignment for each row

map_sizeint

Length of output list. Must be >= uniques in map_index

keep_indexbool

Conserve original index values for each row

Returns
A list of cudf.DataFrame objects.
searchsorted(values, side='left', ascending=True, na_position='last')

Find indices where elements should be inserted to maintain order

Parameters
valueFrame (Shape must be consistent with self)

Values to be hypothetically inserted into Self

sidestr {‘left’, ‘right’} optional, default ‘left‘

If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index

ascendingbool optional, default True

Sorted Frame is in ascending order (otherwise descending)

na_positionstr {‘last’, ‘first’} optional, default ‘last‘

Position of null values in sorted order

Returns
1-D cupy array of insertion points

Examples

>>> s = cudf.Series([1, 2, 3])
>>> s.searchsorted(4)
3
>>> s.searchsorted([0, 4])
array([0, 3], dtype=int32)
>>> s.searchsorted([1, 3], side='left')
array([0, 2], dtype=int32)
>>> s.searchsorted([1, 3], side='right')
array([1, 3], dtype=int32)

If the values are not monotonically sorted, wrong locations may be returned:

>>> s = cudf.Series([2, 1, 3])
>>> s.searchsorted(1)
0   # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]})
>>> df
   a   b
0  1  10
1  3  12
2  5  14
3  7  16
>>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6],
... 'b': [10, 11, 13, 15]})
>>> values_df
   a   b
0  0  10
1  2  17
2  5  13
3  6  15
>>> df.searchsorted(values_df, ascending=False)
array([4, 4, 4, 0], dtype=int32)
set_names(names, level=None, inplace=False)

Set Index or MultiIndex name. Able to set new names partially and by level.

Parameters
nameslabel or list of label

Name(s) to set.

levelint, label or list of int or label, optional

If the index is a MultiIndex, level(s) to set (None for all levels). Otherwise level must be None.

inplacebool, default False

Modifies the object directly, instead of creating a new Index or MultiIndex.

Returns
Index

The same type as the caller or None if inplace is True.

See also

cudf.core.index.Index.rename

Able to set new names without level.

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = cudf.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
            ('python', 2019),
            ( 'cobra', 2018),
            ( 'cobra', 2019)],
           )
>>> idx.names
FrozenList([None, None])
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx.names
FrozenList(['kind', 'year'])
>>> idx.set_names('species', level=0, inplace=True)
>>> idx.names
FrozenList(['species', 'year'])
property shape

Returns a tuple representing the dimensionality of the Index.

shift(periods=1, freq=None, axis=0, fill_value=None)

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.sin()
0    0.000000
1    0.318683
2    0.479426
3    0.850904
4    0.893997
5   -0.801153
6    0.958916
dtype: float64

sin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.sin()
      first    second
0  0.000000 -0.506366
1 -0.958924  0.958916
2 -0.544021 -0.544072
3  0.650288 -0.999756

sin operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.sin()
Float64Index([-0.3894183423086505, -0.5063656411097588,
            0.8011526357338306, 0.8939966636005579],
            dtype='float64')
property size

Return the number of elements in the underlying data.

Returns
sizeSize of the DataFrame / Index / Series / MultiIndex

Examples

Size of an empty dataframe is 0.

>>> import cudf
>>> df = cudf.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>> df.size
0
>>> df = cudf.DataFrame(index=[1, 2, 3])
>>> df
Empty DataFrame
Columns: []
Index: [1, 2, 3]
>>> df.size
0

DataFrame with values

>>> df = cudf.DataFrame({'a': [10, 11, 12],
...         'b': ['hello', 'rapids', 'ai']})
>>> df
    a       b
0  10   hello
1  11  rapids
2  12      ai
>>> df.size
6
>>> df.index
RangeIndex(start=0, stop=3)
>>> df.index.size
3

Size of an Index

>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.size
0
>>> index = cudf.Index([1, 2, 3, 10])
>>> index
Int64Index([1, 2, 3, 10], dtype='int64')
>>> index.size
4

Size of a MultiIndex

>>> midx = cudf.MultiIndex(
...                 levels=[["a", "b", "c", None], ["1", None, "5"]],
...                 codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...                 names=["x", "y"],
...             )
>>> midx
MultiIndex([( 'a',  '1'),
            ( 'a',  '5'),
            ( 'b', <NA>),
            ( 'c', <NA>),
            (<NA>,  '1')],
           names=['x', 'y'])
>>> midx.size
5
sort_values(return_indexer=False, ascending=True, key=None)

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

Parameters
return_indexerbool, default False

Should the indices that would sort the index be returned.

ascendingbool, default True

Should the index values be sorted in an ascending order.

keyNone, optional

This parameter is NON-FUNCTIONAL.

Returns
sorted_indexIndex

Sorted copy of the index.

indexercupy.ndarray, optional

The indices that the index itself was sorted by.

See also

cudf.core.series.Series.min

Sort values of a Series.

cudf.core.dataframe.DataFrame.sort_values

Sort values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')

Sort values in ascending order (default behavior).

>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')

Sort values in descending order, and also get the indices idx was sorted by.

>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2],
                                                    dtype=int32))

Sorting values in a MultiIndex:

>>> midx = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> midx
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> midx.sort_values()
MultiIndex([(-10,  1),
            (  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11)],
           names=['x', 'y'])
>>> midx.sort_values(ascending=False)
MultiIndex([(  4, 11),
            (  3, 11),
            (  1,  5),
            (  1,  1),
            (-10,  1)],
           names=['x', 'y'])
sqrt()

Get the non-negative square-root of all elements, element-wise.

Returns
DataFrame/Series/Index

Result of the non-negative square-root of each element.

Examples

>>> import cudf
>>> import cudf
>>> ser = cudf.Series([10, 25, 81, 1.0, 100])
>>> ser
0     10.0
1     25.0
2     81.0
3      1.0
4    100.0
dtype: float64
>>> ser.sqrt()
0     3.162278
1     5.000000
2     9.000000
3     1.000000
4    10.000000
dtype: float64

sqrt operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-10.0, 100, 625],
...                      'second': [1, 2, 0.4]})
>>> df
   first  second
0  -10.0     1.0
1  100.0     2.0
2  625.0     0.4
>>> df.sqrt()
   first    second
0    NaN  1.000000
1   10.0  1.414214
2   25.0  0.632456

sqrt operation on Index:

>>> index = cudf.Index([-10.0, 100, 625])
>>> index
Float64Index([-10.0, 100.0, 625.0], dtype='float64')
>>> index.sqrt()
Float64Index([nan, 10.0, 25.0], dtype='float64')
sum()

Return the sum of all values of the Index.

Returns
scalar

Sum of all values.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.sum()
6
take(indices)

Gather only the specific subset of indices

Parameters
indices: An array-like that maps to values contained in this Index.
tan()

Get Trigonometric tangent, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.tan()
0    0.000000
1    0.336213
2    0.546302
3    1.619775
4   -1.995200
5    1.338690
6   -3.380140
dtype: float64

tan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.tan()
      first     second
0  0.000000  -0.587214
1 -3.380515  -3.380140
2  0.648361   0.648446
3 -0.855993  45.244742

tan operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.tan()
Float64Index([-0.4227932187381618,  -0.587213915156929,
            -1.3386902103511544, -1.995200412208242],
            dtype='float64')
tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

Parameters
selfinput Table containing columns to interleave.
countNumber of times to tile “rows”. Must be non-negative.
Returns
The table containing the tiled “rows”.

Examples

>>> df  = Dataframe([[8, 4, 7], [5, 2, 3]])
>>> count = 2
>>> df.tile(df, count)
   0  1  2
0  8  4  7
1  5  2  3
0  8  4  7
1  5  2  3
to_array(fillna=None)

Get a dense numpy array for the data.

Parameters
fillnastr or None

Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.

Notes

if fillna is None, null values are skipped. Therefore, the output size could be smaller.

to_arrow()

Convert Index to PyArrow Array

Returns
PyArrow Array

Examples

>>> import cudf
>>> ind = cudf.Index(["a", "b", None])
>>> ind.to_arrow()
<pyarrow.lib.StringArray object at 0x7f796b0e7750>
[
  "a",
  "b",
  null
]
to_dlpack()

Converts a cuDF object into a DLPack tensor.

DLPack is an open-source memory tensor structure: dmlc/dlpack.

This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.

Parameters
cudf_objDataFrame, Series, Index, or Column
Returns
pycapsule_objPyCapsule

Output DLPack tensor pointer which is encapsulated in a PyCapsule object.

to_frame(index=True, name=None)

Create a DataFrame with a column containing this Index

Parameters
indexboolean, default True

Set the index of the returned DataFrame as the original Index

namestr, default None

Name to be used for the column

Returns
DataFrame

cudf DataFrame

to_pandas()

Convert to a Pandas Index.

Examples

>>> import cudf
>>> idx = cudf.Index([-3, 10, 15, 20])
>>> idx
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> idx.to_pandas()
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> type(idx.to_pandas())
<class 'pandas.core.indexes.numeric.Int64Index'>
>>> type(idx)
<class 'cudf.core.index.GenericIndex'>
to_series(index=None, name=None)

Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.

Parameters
indexIndex, optional

Index of resulting Series. If None, defaults to original index.

namestr, optional

Dame of resulting Series. If None, defaults to name of original index.

Returns
Series

The dtype will be based on the type of the Index values.

unique()

Return unique values in the index.

Returns
Index without duplicates
property values

Return an array representing the data in the Index.

Returns
arrayA cupy array of data in the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values
array([  1, -10, 100,  20])
>>> type(index.values)
<class 'cupy.core.core.ndarray'>
property values_host

Return a numpy representation of the Index.

Only the values in the Index will be returned.

Returns
outnumpy.ndarray

The values of the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values_host
array([  1, -10, 100,  20])
>>> type(index.values_host)
<class 'numpy.ndarray'>
where(cond, other=None)

Replace values where the condition is False.

Parameters
condbool array-like with the same length as self

Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.

other: scalar, or array-like

Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.

Returns
Same type as caller

Examples

>>> import cudf
>>> index = cudf.Index([4, 3, 2, 1, 0])
>>> index
Int64Index([4, 3, 2, 1, 0], dtype='int64')
>>> index.where(index > 2, 15)
Int64Index([4, 3, 15, 15, 15], dtype='int64')

Int32Index

class cudf.core.index.Int32Index(data=None, dtype=None, copy=False, name=None)

Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.

Parameters
dataarray-like (1-dimensional)/ DataFrame

If it is a DataFrame, it will return a MultiIndex

dtypeNumPy dtype (default: object)

If dtype is None, we find the dtype that best fits the data.

copybool

Make a copy of input data.

nameobject

Name to be stored in the index.

tupleize_colsbool (default: True)

When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.

Returns
Index

cudf Index

Examples

>>> import cudf
>>> cudf.Index([1, 2, 3], dtype="uint64", name="a")
UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]}))
MultiIndex([(1, 2),
            (2, 3)],
          names=['a', 'b'])
Attributes
dtype

dtype of the underlying values in GenericIndex.

empty

Indicator whether Index is empty.

gpu_values

View the data as a numba device array object

is_monotonic

Alias for is_monotonic_increasing.

is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

is_unique

Return if the index has unique values.

name

Returns the name of the Index.

names

Returns a tuple containing the name of the Index.

ndim

Dimension of the data.

nlevels

Number of levels.

shape

Returns a tuple representing the dimensionality of the Index.

size

Return the number of elements in the underlying data.

values

Return an array representing the data in the Index.

values_host

Return a numpy representation of the Index.

Methods

acos()

Get Trigonometric inverse cosine, element-wise.

any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

argsort([ascending])

Return the integer indices that would sort the index.

asin()

Get Trigonometric inverse sine, element-wise.

astype(dtype[, copy])

Create an Index with values cast to dtypes.

atan()

Get Trigonometric inverse tangent, element-wise.

clip([lower, upper, inplace, axis])

Trim values at input threshold(s).

copy([name, deep, dtype, names])

Make a copy of this object.

cos()

Get Trigonometric cosine, element-wise.

difference(other[, sort])

Return a new Index with elements from the index that are not in other.

drop_duplicates([keep])

Return Index with duplicate values removed

dropna([how])

Return an Index with null values removed.

equals(other, **kwargs)

Determine if two Index objects contain the same elements.

exp()

Get the exponential of all elements, element-wise.

factorize([na_sentinel])

Encode the input values as integer labels

fillna(value[, downcast])

Fill null values with the specified value.

find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

from_pandas(index[, nan_as_null])

Convert from a Pandas Index.

get_level_values(level)

Return an Index of values for requested level.

get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label.

interleave_columns()

Interleave Series columns of a table into a single column.

isin(values)

Return a boolean array where the index values are in values.

isna()

Identify missing values.

isnull()

Identify missing values.

join(other[, how, level, return_indexers, sort])

Compute join_index and indexers to conform data structures to the new index.

log()

Get the natural logarithm of all elements, element-wise.

mask(cond[, other, inplace])

Replace values where the condition is True.

max()

Return the maximum value of the Index.

memory_usage([deep])

Memory usage of the values.

min()

Return the minimum value of the Index.

notna()

Identify non-missing values.

notnull()

Identify non-missing values.

pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

rank([axis, method, numeric_only, …])

Compute numerical data ranks (1 through n) along axis.

rename(name[, inplace])

Alter Index name.

repeat(repeats[, axis])

Repeats elements consecutively.

round([decimals])

Round a DataFrame to a variable number of decimal places.

sample([n, frac, replace, weights, …])

Return a random sample of items from an axis of object.

scatter_by_map(map_index[, map_size, keep_index])

Scatter to a list of dataframes.

searchsorted(values[, side, ascending, …])

Find indices where elements should be inserted to maintain order

set_names(names[, level, inplace])

Set Index or MultiIndex name.

shift([periods, freq, axis, fill_value])

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

sort_values([return_indexer, ascending, key])

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

sqrt()

Get the non-negative square-root of all elements, element-wise.

sum()

Return the sum of all values of the Index.

take(indices)

Gather only the specific subset of indices

tan()

Get Trigonometric tangent, element-wise.

tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

to_array([fillna])

Get a dense numpy array for the data.

to_arrow()

Convert Index to PyArrow Array

to_dlpack()

Converts a cuDF object into a DLPack tensor.

to_frame([index, name])

Create a DataFrame with a column containing this Index

to_pandas()

Convert to a Pandas Index.

to_series([index, name])

Create a Series with both index and values equal to the index keys.

unique()

Return unique values in the index.

where(cond[, other])

Replace values where the condition is False.

replace

acos()

Get Trigonometric inverse cosine, element-wise.

The inverse of cos so that, if y = x.cos(), then x = y.acos()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.acos()
0    3.141593
1    1.570796
2    0.000000
3    1.240482
4    1.047198
dtype: float64

acos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.acos()
      first    second
0  3.141593  1.334606
1  1.570796  1.266104
2  1.047198  1.470629

acos operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.acos()
Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0,
            1.5707963267948966,  1.266103672779499],
            dtype='float64')
any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

Parameters
otherIndex or list/tuple of indices
Returns
appendedIndex

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 10, 100])
>>> idx
Int64Index([1, 2, 10, 100], dtype='int64')
>>> other = cudf.Index([200, 400, 50])
>>> other
Int64Index([200, 400, 50], dtype='int64')
>>> idx.append(other)
Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')

append accepts list of Index objects

>>> idx.append([other, other])
Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
argsort(ascending=True, **kwargs)

Return the integer indices that would sort the index.

Parameters
ascendingbool, default True

If True, returns the indices for ascending order. If False, returns the indices for descending order.

Returns
arrayA cupy array containing Integer indices that

would sort the index if used as an indexer.

Examples

>>> import cudf
>>> index = cudf.Index([10, 100, 1, 1000])
>>> index
Int64Index([10, 100, 1, 1000], dtype='int64')
>>> index.argsort()
array([2, 0, 1, 3], dtype=int32)

The order of argsort can be reversed using ascending parameter, by setting it to False. >>> index.argsort(ascending=False) array([3, 1, 0, 2], dtype=int32)

argsort on a MultiIndex:

>>> index = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> index
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> index.argsort()
array([4, 0, 1, 2, 3], dtype=int32)
>>> index.argsort(ascending=False)
array([3, 2, 1, 0, 4], dtype=int32)
asin()

Get Trigonometric inverse sine, element-wise.

The inverse of sine so that, if y = x.sin(), then x = y.asin()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.asin()
0   -1.570796
1    0.000000
2    1.570796
3    0.330314
4    0.523599
dtype: float64

asin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.asin()
      first    second
0 -1.570796  0.236190
1  0.000000  0.304693
2  0.523599  0.100167

asin operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64')
>>> index.asin()
Float64Index([-1.5707963267948966, 0.41151684606748806,
            1.5707963267948966, 0.3046926540153975],
            dtype='float64')
astype(dtype, copy=False)

Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.

Parameters
dtypenumpy dtype

Use a numpy.dtype to cast entire Index object to.

copybool, default False

By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.

Returns
Index

Index with values cast to specified dtype.

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3])
>>> index
Int64Index([1, 2, 3], dtype='int64')
>>> index.astype('float64')
Float64Index([1.0, 2.0, 3.0], dtype='float64')
atan()

Get Trigonometric inverse tangent, element-wise.

The inverse of tan so that, if y = x.tan(), then x = y.atan()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10])
>>> ser
0    -1.00000
1     0.00000
2     1.00000
3     0.32434
4     0.50000
5   -10.00000
dtype: float64
>>> ser.atan()
0   -0.785398
1    0.000000
2    0.785398
3    0.313635
4    0.463648
5   -1.471128
dtype: float64

atan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.atan()
      first    second
0 -0.785398  0.229864
1 -1.471128  0.291457
2  0.463648  1.471128

atan operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.atan()
Float64Index([-0.7853981633974483,  0.3805063771123649,
                            0.7853981633974483, 0.0,
                            0.2914567944778671],
            dtype='float64')
clip(lower=None, upper=None, inplace=False, axis=1)

Trim values at input threshold(s).

Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.

Parameters
lowerscalar or array_like, default None

Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.

upperscalar or array_like, default None

Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.

inplacebool, default False
Returns
Clipped DataFrame/Series/Index/MultiIndex

Examples

>>> import cudf
>>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']})
>>> df.clip(lower=[2, 'b'], upper=[3, 'c'])
   a  b
0  2  b
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=None, upper=[3, 'c'])
   a  b
0  1  a
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=[2, 'b'], upper=None)
   a  b
0  2  b
1  2  b
2  3  c
3  4  d
>>> df.clip(lower=2, upper=3, inplace=True)
>>> df
   a  b
0  2  2
1  2  3
2  3  3
3  3  3
>>> import cudf
>>> sr = cudf.Series([1, 2, 3, 4])
>>> sr.clip(lower=2, upper=3)
0    2
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=None, upper=3)
0    1
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True)
>>> sr
0    2
1    2
2    3
3    4
dtype: int64
copy(name=None, deep=False, dtype=None, names=None)

Make a copy of this object.

Parameters
nameobject, default None

Name of index, use original name when None

deepbool, default True

Make a deep copy of the data. With deep=False the original data is used

dtypenumpy dtype, default None

Target datatype to cast into, use original dtype when None

nameslist-like, default False

Kept compatibility with MultiIndex. Should not be used.

Returns
New index instance, casted to new dtype
cos()

Get Trigonometric cosine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.cos()
0    1.000000
1    0.947861
2    0.877583
3    0.525322
4   -0.448074
5   -0.598460
6   -0.283691
dtype: float64

cos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.cos()
      first    second
0  1.000000  0.862319
1  0.283662 -0.283691
2 -0.839072 -0.839039
3 -0.759688 -0.022097

cos operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.cos()
Float64Index([ 0.9210609940028851,  0.8623188722876839,
            -0.5984600690578581, -0.4480736161291701],
            dtype='float64')
difference(other, sort=None)

Return a new Index with elements from the index that are not in other.

This is the set difference of two Index objects.

Parameters
otherIndex or array-like
sortFalse or None, default None

Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.

  • None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.

  • False : Do not sort the result.

Returns
differenceIndex

Examples

>>> import cudf
>>> idx1 = cudf.Index([2, 1, 3, 4])
>>> idx1
Int64Index([2, 1, 3, 4], dtype='int64')
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx2
Int64Index([3, 4, 5, 6], dtype='int64')
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
drop_duplicates(keep='first')

Return Index with duplicate values removed

Parameters
keep{‘first’, ‘last’, False}, default ‘first’
  • ‘first’Drop duplicates except for the

    first occurrence.

  • ‘last’Drop duplicates except for the

    last occurrence.

  • False : Drop all duplicates.

Returns
deduplicatedIndex

Examples

>>> import cudf
>>> idx = cudf.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
>>> idx
StringIndex(['lama' 'cow' 'lama' 'beetle' 'lama' 'hippo'], dtype='object')
>>> idx.drop_duplicates()
StringIndex(['beetle' 'cow' 'hippo' 'lama'], dtype='object')
dropna(how='any')

Return an Index with null values removed.

Parameters
how{‘any’, ‘all’}, default ‘any’

If the Index is a MultiIndex, drop the value when any or all levels are NaN.

Returns
validIndex

Examples

>>> import cudf
>>> index = cudf.Index(['a', None, 'b', 'c'])
>>> index
StringIndex(['a' None 'b' 'c'], dtype='object')
>>> index.dropna()
StringIndex(['a' 'b' 'c'], dtype='object')

Using dropna on a MultiIndex:

>>> midx = cudf.MultiIndex(
...         levels=[[1, None, 4, None], [1, 2, 5]],
...         codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...         names=["x", "y"],
...     )
>>> midx
MultiIndex([(   1, 1),
            (   1, 5),
            (<NA>, 2),
            (   4, 2),
            (<NA>, 1)],
           names=['x', 'y'])
>>> midx.dropna()
MultiIndex([(1, 1),
            (1, 5),
            (4, 2)],
           names=['x', 'y'])
property dtype

dtype of the underlying values in GenericIndex.

property empty

Indicator whether Index is empty.

True if Index is entirely empty (no items).

Returns
outbool

If Index is empty, return True, if not return False.

Examples

>>> import cudf
>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.empty
True
equals(other, **kwargs)

Determine if two Index objects contain the same elements.

Returns
out: bool

True if “other” is an Index and it has the same elements as calling index; False otherwise.

exp()

Get the exponential of all elements, element-wise.

Exponential is the inverse of the log function, so that x.exp().log() = x

Returns
DataFrame/Series/Index

Result of the element-wise exponential.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.exp()
0    3.678794e-01
1    1.000000e+00
2    2.718282e+00
3    1.383117e+00
4    1.648721e+00
5    4.539993e-05
6    2.688117e+43
dtype: float64

exp operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.exp()
      first        second
0  0.367879      1.263644
1  0.000045      1.349859
2  1.648721  22026.465795

exp operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.exp()
Float64Index([0.36787944117144233,  1.4918246976412703,
              2.718281828459045, 1.0,  1.3498588075760032],
            dtype='float64')
factorize(na_sentinel=- 1)

Encode the input values as integer labels

See also

cudf.core.series.Series.factorize

Encode the input values of Series.

fillna(value, downcast=None)

Fill null values with the specified value.

Parameters
valuescalar

Scalar value to use to fill nulls. This value cannot be a list-likes.

downcastdict, default is None

This Parameter is currently NON-FUNCTIONAL.

Returns
filledIndex

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, None, 4])
>>> index
Int64Index([1, 2, null, 4], dtype='int64')
>>> index.fillna(3)
Int64Index([1, 2, 3, 4], dtype='int64')
find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

Returns
begin, end2-tuple of int

The starting index and the ending index. The last value occurs at end - 1 position.

classmethod from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

Parameters
arrayPyArrow Array/ChunkedArray

PyArrow Object which has to be converted to Index

Returns
cudf Index
Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pyarrow as pa
>>> cudf.Index.from_arrow(pa.array(["a", "b", None]))
StringIndex(['a' 'b' None], dtype='object')
classmethod from_pandas(index, nan_as_null=None)

Convert from a Pandas Index.

Parameters
indexPandas Index object

A Pandas Index object which has to be converted to cuDF Index.

nan_as_nullbool, Default None

If None/True, converts np.nan values to null values. If False, leaves np.nan values as is.

Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pandas as pd
>>> import numpy as np
>>> data = [10, 20, 30, np.nan]
>>> pdi = pd.Index(data)
>>> cudf.core.index.Index.from_pandas(pdi)
Float64Index([10.0, 20.0, 30.0, <NA>], dtype='float64')
>>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False)
Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
get_level_values(level)

Return an Index of values for requested level.

This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.

Parameters
levelint or str

It is either the integer position or the name of the level.

Returns
Index

Calling object, as there is only one level in the Index.

See also

cudf.core.multiindex.MultiIndex.get_level_values

Get values for a level of a MultiIndex.

Notes

For Index, level should be 0, since there are no multiple levels.

Examples

>>> import cudf
>>> idx = cudf.Index(["a", "b", "c"])
>>> idx.get_level_values(0)
StringIndex(['a' 'b' 'c'], dtype='object')
get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if side=='right') position of given label.

Parameters
labelobject
side{‘left’, ‘right’}
kind{‘ix’, ‘loc’, ‘getitem’}
Returns
int

Index of label.

property gpu_values

View the data as a numba device array object

interleave_columns()

Interleave Series columns of a table into a single column.

Converts the column major table cols into a row major column.

Parameters
colsinput Table containing columns to interleave.
Returns
The interleaved columns as a single column

Examples

>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']])
>>> df
0    [A1, A2, A3]
1    [B1, B2, B3]
>>> df.interleave_columns()
0    A1
1    B1
2    A2
3    B2
4    A3
5    B3
property is_monotonic

Alias for is_monotonic_increasing.

property is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

property is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

property is_unique

Return if the index has unique values.

isin(values)

Return a boolean array where the index values are in values.

Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.

Parameters
valuesset, list-like, Index

Sought values.

Returns
is_containedcupy array

CuPy array of boolean values.

Examples

>>> idx = cudf.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')

Check whether each index value in a list of values.

>>> idx.isin([1, 4])
array([ True, False, False])
isna()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
isnull()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
join(other, how='left', level=None, return_indexers=False, sort=False)

Compute join_index and indexers to conform data structures to the new index.

Parameters
otherIndex.
how{‘left’, ‘right’, ‘inner’, ‘outer’}
return_indexersbool, default False
sortbool, default False

Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).

Returns: index

Examples

>>> import cudf
>>> lhs = cudf.DataFrame(
...     {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b']
... ).index
>>> lhs
MultiIndex([(2, 3),
            (3, 4),
            (1, 2)],
           names=['a', 'b'])
>>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index
>>> rhs
Int64Index([1, 4, 3], dtype='int64', name='a')
>>> lhs.join(rhs, how='inner')
MultiIndex([(3, 4),
            (1, 2)],
           names=['a', 'b'])
log()

Get the natural logarithm of all elements, element-wise.

Natural logarithm is the inverse of the exp function, so that x.log().exp() = x

Returns
DataFrame/Series/Index

Result of the element-wise natural logarithm.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.log()
0         NaN
1        -inf
2    0.000000
3   -1.125963
4   -0.693147
5         NaN
6    4.605170
dtype: float64

log operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.log()
      first    second
0       NaN -1.452434
1       NaN -1.203973
2 -0.693147  2.302585

log operation on Index:

>>> index = cudf.Index([10, 11, 500.0])
>>> index
Float64Index([10.0, 11.0, 500.0], dtype='float64')
>>> index.log()
Float64Index([2.302585092994046, 2.3978952727983707,
            6.214608098422191], dtype='float64')
mask(cond, other=None, inplace=False)

Replace values where the condition is True.

Parameters
condbool Series/DataFrame, array-like

Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.

other: scalar, list of scalars, Series/DataFrame

Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.

DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.

Series expects only scalar or series like with same length

inplacebool, default False

Whether to perform the operation in place on the data.

Returns
Same type as caller

Examples

>>> import cudf
>>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]})
>>> df.mask(df % 2 == 0, [-1, -1])
   A  B
0  1  3
1 -1  5
2  5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.mask(ser > 2, 10)
0    10
1    10
2     2
3     1
4     0
dtype: int64
>>> ser.mask(ser > 2)
0    <NA>
1    <NA>
2       2
3       1
4       0
dtype: int64
max()

Return the maximum value of the Index.

Returns
scalar

Maximum value.

See also

cudf.core.index.Index.min

Return the minimum value in an Index.

cudf.core.series.Series.max

Return the maximum value in a Series.

cudf.core.dataframe.DataFrame.max

Return the maximum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.max()
3
memory_usage(deep=False)

Memory usage of the values.

Parameters
deepbool

Introspect the data deeply, interrogate object dtypes for system-level memory consumption.

Returns
bytes used
min()

Return the minimum value of the Index.

Returns
scalar

Minimum value.

See also

cudf.core.index.Index.max

Return the maximum value in an Index.

cudf.core.series.Series.min

Return the minimum value in a Series.

cudf.core.dataframe.DataFrame.min

Return the minimum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.min()
1
property name

Returns the name of the Index.

property names

Returns a tuple containing the name of the Index.

property ndim

Dimension of the data. Apart from MultiIndex ndim is always 1.

property nlevels

Number of levels.

notna()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
notnull()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

Parameters
funcfunction

Function to apply to the Series/DataFrame/Index. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame/Index.

argsiterable, optional

Positional arguments passed into func.

kwargsmapping, optional

A dictionary of keyword arguments passed into func.

Returns
objectthe return type of func.

Examples

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. Instead of writing

>>> func(g(h(df), arg1=a), arg2=b, arg3=c)

You can write

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe(func, arg2=b, arg3=c)
... )

If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose f takes its data as arg2:

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe((func, 'arg2'), arg1=a, arg3=c)
...  )
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)

Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.

Parameters
axis{0 or ‘index’, 1 or ‘columns’}, default 0

Index to direct ranking.

method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’

How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.

numeric_onlybool, optional

For DataFrame objects, rank only numeric columns if set to True.

na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’

How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.

ascendingbool, default True

Whether or not the elements should be ranked in ascending order.

pctbool, default False

Whether or not to display the returned rankings in percentile form.

Returns
same type as caller

Return a Series or DataFrame with data ranks as values.

rename(name, inplace=False)

Alter Index name.

Defaults to returning new index.

Parameters
namelabel

Name(s) to set.

Returns
Index

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3], name='one')
>>> index
Int64Index([1, 2, 3], dtype='int64', name='one')
>>> index.name
'one'
>>> renamed_index = index.rename('two')
>>> renamed_index
Int64Index([1, 2, 3], dtype='int64', name='two')
>>> renamed_index.name
'two'
repeat(repeats, axis=None)

Repeats elements consecutively.

Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.

Parameters
repeatsint, or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.

Returns
Series/DataFrame/Index

A newly created object of same type as caller with repeated elements.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]})
>>> df
   a   b
0  1  10
1  2  20
2  3  30
>>> df.repeat(3)
   a   b
0  1  10
0  1  10
0  1  10
1  2  20
1  2  20
1  2  20
2  3  30
2  3  30
2  3  30

Repeat on Series

>>> s = cudf.Series([0, 2])
>>> s
0    0
1    2
dtype: int64
>>> s.repeat([3, 4])
0    0
0    0
0    0
1    2
1    2
1    2
1    2
dtype: int64
>>> s.repeat(2)
0    0
0    0
1    2
1    2
dtype: int64

Repeat on Index

>>> index = cudf.Index([10, 22, 33, 55])
>>> index
Int64Index([10, 22, 33, 55], dtype='int64')
>>> index.repeat(5)
Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33,
            33, 33, 33, 33, 55, 55, 55, 55, 55],
        dtype='int64')
round(decimals=0)

Round a DataFrame to a variable number of decimal places.

Parameters
decimalsint, dict, Series

Number of decimal places to round each column to. If an int is given, round each column to the same number of places. Otherwise dict and Series round to variable numbers of places. Column names should be in the keys if decimals is a dict-like, or in the index if decimals is a Series. Any columns not included in decimals will be left as is. Elements of decimals which are not columns of the input will be ignored.

Returns
DataFrame

A DataFrame with the affected columns rounded to the specified number of decimal places.

Examples

>>> df = cudf.DataFrame(
        [(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
...     columns=['dogs', 'cats']
... )
>>> df
    dogs  cats
0  0.21  0.32
1  0.01  0.67
2  0.66  0.03
3  0.21  0.18

By providing an integer each column is rounded to the same number of decimal places

>>> df.round(1)
    dogs  cats
0   0.2   0.3
1   0.0   0.7
2   0.7   0.0
3   0.2   0.2

With a dict, the number of places for specific columns can be specified with the column names as key and the number of decimal places as value

>>> df.round({'dogs': 1, 'cats': 0})
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

Using a Series, the number of places for specific columns can be specified with the column names as index and the number of decimal places as value

>>> decimals = cudf.Series([0, 1], index=['cats', 'dogs'])
>>> df.round(decimals)
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters
nint, optional

Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

fracfloat, optional

Fraction of axis items to return. Cannot be used with n.

replacebool, default False

Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”

weightsstr or ndarray-like, optional

Only supported for axis=1/”columns”

random_stateint, numpy RandomState or None, default None

Seed for the random number generator (if int), or None. If None, a random seed will be chosen. if RandomState, seed will be extracted from current state.

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.

Returns
Series or DataFrame or Index

A new object of same type as caller containing n items randomly sampled from the caller object.

Examples

>>> import cudf as cudf
>>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}})
>>> df.sample(3)
   a
1  2
3  4
0  1
>>> sr = cudf.Series([1, 2, 3, 4, 5])
>>> sr.sample(10, replace=True)
1    4
3    1
2    4
0    5
0    1
4    5
4    1
0    2
0    3
3    2
dtype: int64
>>> df = cudf.DataFrame(
... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]})
>>> df.sample(2, axis=1)
   a  c
0  1  3
1  2  4
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)

Scatter to a list of dataframes.

Uses map_index to determine the destination of each row of the original DataFrame.

Parameters
map_indexSeries, str or list-like

Scatter assignment for each row

map_sizeint

Length of output list. Must be >= uniques in map_index

keep_indexbool

Conserve original index values for each row

Returns
A list of cudf.DataFrame objects.
searchsorted(values, side='left', ascending=True, na_position='last')

Find indices where elements should be inserted to maintain order

Parameters
valueFrame (Shape must be consistent with self)

Values to be hypothetically inserted into Self

sidestr {‘left’, ‘right’} optional, default ‘left‘

If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index

ascendingbool optional, default True

Sorted Frame is in ascending order (otherwise descending)

na_positionstr {‘last’, ‘first’} optional, default ‘last‘

Position of null values in sorted order

Returns
1-D cupy array of insertion points

Examples

>>> s = cudf.Series([1, 2, 3])
>>> s.searchsorted(4)
3
>>> s.searchsorted([0, 4])
array([0, 3], dtype=int32)
>>> s.searchsorted([1, 3], side='left')
array([0, 2], dtype=int32)
>>> s.searchsorted([1, 3], side='right')
array([1, 3], dtype=int32)

If the values are not monotonically sorted, wrong locations may be returned:

>>> s = cudf.Series([2, 1, 3])
>>> s.searchsorted(1)
0   # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]})
>>> df
   a   b
0  1  10
1  3  12
2  5  14
3  7  16
>>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6],
... 'b': [10, 11, 13, 15]})
>>> values_df
   a   b
0  0  10
1  2  17
2  5  13
3  6  15
>>> df.searchsorted(values_df, ascending=False)
array([4, 4, 4, 0], dtype=int32)
set_names(names, level=None, inplace=False)

Set Index or MultiIndex name. Able to set new names partially and by level.

Parameters
nameslabel or list of label

Name(s) to set.

levelint, label or list of int or label, optional

If the index is a MultiIndex, level(s) to set (None for all levels). Otherwise level must be None.

inplacebool, default False

Modifies the object directly, instead of creating a new Index or MultiIndex.

Returns
Index

The same type as the caller or None if inplace is True.

See also

cudf.core.index.Index.rename

Able to set new names without level.

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = cudf.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
            ('python', 2019),
            ( 'cobra', 2018),
            ( 'cobra', 2019)],
           )
>>> idx.names
FrozenList([None, None])
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx.names
FrozenList(['kind', 'year'])
>>> idx.set_names('species', level=0, inplace=True)
>>> idx.names
FrozenList(['species', 'year'])
property shape

Returns a tuple representing the dimensionality of the Index.

shift(periods=1, freq=None, axis=0, fill_value=None)

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.sin()
0    0.000000
1    0.318683
2    0.479426
3    0.850904
4    0.893997
5   -0.801153
6    0.958916
dtype: float64

sin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.sin()
      first    second
0  0.000000 -0.506366
1 -0.958924  0.958916
2 -0.544021 -0.544072
3  0.650288 -0.999756

sin operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.sin()
Float64Index([-0.3894183423086505, -0.5063656411097588,
            0.8011526357338306, 0.8939966636005579],
            dtype='float64')
property size

Return the number of elements in the underlying data.

Returns
sizeSize of the DataFrame / Index / Series / MultiIndex

Examples

Size of an empty dataframe is 0.

>>> import cudf
>>> df = cudf.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>> df.size
0
>>> df = cudf.DataFrame(index=[1, 2, 3])
>>> df
Empty DataFrame
Columns: []
Index: [1, 2, 3]
>>> df.size
0

DataFrame with values

>>> df = cudf.DataFrame({'a': [10, 11, 12],
...         'b': ['hello', 'rapids', 'ai']})
>>> df
    a       b
0  10   hello
1  11  rapids
2  12      ai
>>> df.size
6
>>> df.index
RangeIndex(start=0, stop=3)
>>> df.index.size
3

Size of an Index

>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.size
0
>>> index = cudf.Index([1, 2, 3, 10])
>>> index
Int64Index([1, 2, 3, 10], dtype='int64')
>>> index.size
4

Size of a MultiIndex

>>> midx = cudf.MultiIndex(
...                 levels=[["a", "b", "c", None], ["1", None, "5"]],
...                 codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...                 names=["x", "y"],
...             )
>>> midx
MultiIndex([( 'a',  '1'),
            ( 'a',  '5'),
            ( 'b', <NA>),
            ( 'c', <NA>),
            (<NA>,  '1')],
           names=['x', 'y'])
>>> midx.size
5
sort_values(return_indexer=False, ascending=True, key=None)

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

Parameters
return_indexerbool, default False

Should the indices that would sort the index be returned.

ascendingbool, default True

Should the index values be sorted in an ascending order.

keyNone, optional

This parameter is NON-FUNCTIONAL.

Returns
sorted_indexIndex

Sorted copy of the index.

indexercupy.ndarray, optional

The indices that the index itself was sorted by.

See also

cudf.core.series.Series.min

Sort values of a Series.

cudf.core.dataframe.DataFrame.sort_values

Sort values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')

Sort values in ascending order (default behavior).

>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')

Sort values in descending order, and also get the indices idx was sorted by.

>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2],
                                                    dtype=int32))

Sorting values in a MultiIndex:

>>> midx = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> midx
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> midx.sort_values()
MultiIndex([(-10,  1),
            (  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11)],
           names=['x', 'y'])
>>> midx.sort_values(ascending=False)
MultiIndex([(  4, 11),
            (  3, 11),
            (  1,  5),
            (  1,  1),
            (-10,  1)],
           names=['x', 'y'])
sqrt()

Get the non-negative square-root of all elements, element-wise.

Returns
DataFrame/Series/Index

Result of the non-negative square-root of each element.

Examples

>>> import cudf
>>> import cudf
>>> ser = cudf.Series([10, 25, 81, 1.0, 100])
>>> ser
0     10.0
1     25.0
2     81.0
3      1.0
4    100.0
dtype: float64
>>> ser.sqrt()
0     3.162278
1     5.000000
2     9.000000
3     1.000000
4    10.000000
dtype: float64

sqrt operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-10.0, 100, 625],
...                      'second': [1, 2, 0.4]})
>>> df
   first  second
0  -10.0     1.0
1  100.0     2.0
2  625.0     0.4
>>> df.sqrt()
   first    second
0    NaN  1.000000
1   10.0  1.414214
2   25.0  0.632456

sqrt operation on Index:

>>> index = cudf.Index([-10.0, 100, 625])
>>> index
Float64Index([-10.0, 100.0, 625.0], dtype='float64')
>>> index.sqrt()
Float64Index([nan, 10.0, 25.0], dtype='float64')
sum()

Return the sum of all values of the Index.

Returns
scalar

Sum of all values.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.sum()
6
take(indices)

Gather only the specific subset of indices

Parameters
indices: An array-like that maps to values contained in this Index.
tan()

Get Trigonometric tangent, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.tan()
0    0.000000
1    0.336213
2    0.546302
3    1.619775
4   -1.995200
5    1.338690
6   -3.380140
dtype: float64

tan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.tan()
      first     second
0  0.000000  -0.587214
1 -3.380515  -3.380140
2  0.648361   0.648446
3 -0.855993  45.244742

tan operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.tan()
Float64Index([-0.4227932187381618,  -0.587213915156929,
            -1.3386902103511544, -1.995200412208242],
            dtype='float64')
tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

Parameters
selfinput Table containing columns to interleave.
countNumber of times to tile “rows”. Must be non-negative.
Returns
The table containing the tiled “rows”.

Examples

>>> df  = Dataframe([[8, 4, 7], [5, 2, 3]])
>>> count = 2
>>> df.tile(df, count)
   0  1  2
0  8  4  7
1  5  2  3
0  8  4  7
1  5  2  3
to_array(fillna=None)

Get a dense numpy array for the data.

Parameters
fillnastr or None

Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.

Notes

if fillna is None, null values are skipped. Therefore, the output size could be smaller.

to_arrow()

Convert Index to PyArrow Array

Returns
PyArrow Array

Examples

>>> import cudf
>>> ind = cudf.Index(["a", "b", None])
>>> ind.to_arrow()
<pyarrow.lib.StringArray object at 0x7f796b0e7750>
[
  "a",
  "b",
  null
]
to_dlpack()

Converts a cuDF object into a DLPack tensor.

DLPack is an open-source memory tensor structure: dmlc/dlpack.

This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.

Parameters
cudf_objDataFrame, Series, Index, or Column
Returns
pycapsule_objPyCapsule

Output DLPack tensor pointer which is encapsulated in a PyCapsule object.

to_frame(index=True, name=None)

Create a DataFrame with a column containing this Index

Parameters
indexboolean, default True

Set the index of the returned DataFrame as the original Index

namestr, default None

Name to be used for the column

Returns
DataFrame

cudf DataFrame

to_pandas()

Convert to a Pandas Index.

Examples

>>> import cudf
>>> idx = cudf.Index([-3, 10, 15, 20])
>>> idx
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> idx.to_pandas()
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> type(idx.to_pandas())
<class 'pandas.core.indexes.numeric.Int64Index'>
>>> type(idx)
<class 'cudf.core.index.GenericIndex'>
to_series(index=None, name=None)

Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.

Parameters
indexIndex, optional

Index of resulting Series. If None, defaults to original index.

namestr, optional

Dame of resulting Series. If None, defaults to name of original index.

Returns
Series

The dtype will be based on the type of the Index values.

unique()

Return unique values in the index.

Returns
Index without duplicates
property values

Return an array representing the data in the Index.

Returns
arrayA cupy array of data in the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values
array([  1, -10, 100,  20])
>>> type(index.values)
<class 'cupy.core.core.ndarray'>
property values_host

Return a numpy representation of the Index.

Only the values in the Index will be returned.

Returns
outnumpy.ndarray

The values of the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values_host
array([  1, -10, 100,  20])
>>> type(index.values_host)
<class 'numpy.ndarray'>
where(cond, other=None)

Replace values where the condition is False.

Parameters
condbool array-like with the same length as self

Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.

other: scalar, or array-like

Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.

Returns
Same type as caller

Examples

>>> import cudf
>>> index = cudf.Index([4, 3, 2, 1, 0])
>>> index
Int64Index([4, 3, 2, 1, 0], dtype='int64')
>>> index.where(index > 2, 15)
Int64Index([4, 3, 15, 15, 15], dtype='int64')

Int64Index

class cudf.core.index.Int64Index(data=None, dtype=None, copy=False, name=None)

Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.

Parameters
dataarray-like (1-dimensional)/ DataFrame

If it is a DataFrame, it will return a MultiIndex

dtypeNumPy dtype (default: object)

If dtype is None, we find the dtype that best fits the data.

copybool

Make a copy of input data.

nameobject

Name to be stored in the index.

tupleize_colsbool (default: True)

When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.

Returns
Index

cudf Index

Examples

>>> import cudf
>>> cudf.Index([1, 2, 3], dtype="uint64", name="a")
UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]}))
MultiIndex([(1, 2),
            (2, 3)],
          names=['a', 'b'])
Attributes
dtype

dtype of the underlying values in GenericIndex.

empty

Indicator whether Index is empty.

gpu_values

View the data as a numba device array object

is_monotonic

Alias for is_monotonic_increasing.

is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

is_unique

Return if the index has unique values.

name

Returns the name of the Index.

names

Returns a tuple containing the name of the Index.

ndim

Dimension of the data.

nlevels

Number of levels.

shape

Returns a tuple representing the dimensionality of the Index.

size

Return the number of elements in the underlying data.

values

Return an array representing the data in the Index.

values_host

Return a numpy representation of the Index.

Methods

acos()

Get Trigonometric inverse cosine, element-wise.

any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

argsort([ascending])

Return the integer indices that would sort the index.

asin()

Get Trigonometric inverse sine, element-wise.

astype(dtype[, copy])

Create an Index with values cast to dtypes.

atan()

Get Trigonometric inverse tangent, element-wise.

clip([lower, upper, inplace, axis])

Trim values at input threshold(s).

copy([name, deep, dtype, names])

Make a copy of this object.

cos()

Get Trigonometric cosine, element-wise.

difference(other[, sort])

Return a new Index with elements from the index that are not in other.

drop_duplicates([keep])

Return Index with duplicate values removed

dropna([how])

Return an Index with null values removed.

equals(other, **kwargs)

Determine if two Index objects contain the same elements.

exp()

Get the exponential of all elements, element-wise.

factorize([na_sentinel])

Encode the input values as integer labels

fillna(value[, downcast])

Fill null values with the specified value.

find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

from_pandas(index[, nan_as_null])

Convert from a Pandas Index.

get_level_values(level)

Return an Index of values for requested level.

get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label.

interleave_columns()

Interleave Series columns of a table into a single column.

isin(values)

Return a boolean array where the index values are in values.

isna()

Identify missing values.

isnull()

Identify missing values.

join(other[, how, level, return_indexers, sort])

Compute join_index and indexers to conform data structures to the new index.

log()

Get the natural logarithm of all elements, element-wise.

mask(cond[, other, inplace])

Replace values where the condition is True.

max()

Return the maximum value of the Index.

memory_usage([deep])

Memory usage of the values.

min()

Return the minimum value of the Index.

notna()

Identify non-missing values.

notnull()

Identify non-missing values.

pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

rank([axis, method, numeric_only, …])

Compute numerical data ranks (1 through n) along axis.

rename(name[, inplace])

Alter Index name.

repeat(repeats[, axis])

Repeats elements consecutively.

round([decimals])

Round a DataFrame to a variable number of decimal places.

sample([n, frac, replace, weights, …])

Return a random sample of items from an axis of object.

scatter_by_map(map_index[, map_size, keep_index])

Scatter to a list of dataframes.

searchsorted(values[, side, ascending, …])

Find indices where elements should be inserted to maintain order

set_names(names[, level, inplace])

Set Index or MultiIndex name.

shift([periods, freq, axis, fill_value])

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

sort_values([return_indexer, ascending, key])

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

sqrt()

Get the non-negative square-root of all elements, element-wise.

sum()

Return the sum of all values of the Index.

take(indices)

Gather only the specific subset of indices

tan()

Get Trigonometric tangent, element-wise.

tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

to_array([fillna])

Get a dense numpy array for the data.

to_arrow()

Convert Index to PyArrow Array

to_dlpack()

Converts a cuDF object into a DLPack tensor.

to_frame([index, name])

Create a DataFrame with a column containing this Index

to_pandas()

Convert to a Pandas Index.

to_series([index, name])

Create a Series with both index and values equal to the index keys.

unique()

Return unique values in the index.

where(cond[, other])

Replace values where the condition is False.

replace

acos()

Get Trigonometric inverse cosine, element-wise.

The inverse of cos so that, if y = x.cos(), then x = y.acos()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.acos()
0    3.141593
1    1.570796
2    0.000000
3    1.240482
4    1.047198
dtype: float64

acos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.acos()
      first    second
0  3.141593  1.334606
1  1.570796  1.266104
2  1.047198  1.470629

acos operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.acos()
Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0,
            1.5707963267948966,  1.266103672779499],
            dtype='float64')
any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

Parameters
otherIndex or list/tuple of indices
Returns
appendedIndex

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 10, 100])
>>> idx
Int64Index([1, 2, 10, 100], dtype='int64')
>>> other = cudf.Index([200, 400, 50])
>>> other
Int64Index([200, 400, 50], dtype='int64')
>>> idx.append(other)
Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')

append accepts list of Index objects

>>> idx.append([other, other])
Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
argsort(ascending=True, **kwargs)

Return the integer indices that would sort the index.

Parameters
ascendingbool, default True

If True, returns the indices for ascending order. If False, returns the indices for descending order.

Returns
arrayA cupy array containing Integer indices that

would sort the index if used as an indexer.

Examples

>>> import cudf
>>> index = cudf.Index([10, 100, 1, 1000])
>>> index
Int64Index([10, 100, 1, 1000], dtype='int64')
>>> index.argsort()
array([2, 0, 1, 3], dtype=int32)

The order of argsort can be reversed using ascending parameter, by setting it to False. >>> index.argsort(ascending=False) array([3, 1, 0, 2], dtype=int32)

argsort on a MultiIndex:

>>> index = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> index
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> index.argsort()
array([4, 0, 1, 2, 3], dtype=int32)
>>> index.argsort(ascending=False)
array([3, 2, 1, 0, 4], dtype=int32)
asin()

Get Trigonometric inverse sine, element-wise.

The inverse of sine so that, if y = x.sin(), then x = y.asin()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.asin()
0   -1.570796
1    0.000000
2    1.570796
3    0.330314
4    0.523599
dtype: float64

asin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.asin()
      first    second
0 -1.570796  0.236190
1  0.000000  0.304693
2  0.523599  0.100167

asin operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64')
>>> index.asin()
Float64Index([-1.5707963267948966, 0.41151684606748806,
            1.5707963267948966, 0.3046926540153975],
            dtype='float64')
astype(dtype, copy=False)

Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.

Parameters
dtypenumpy dtype

Use a numpy.dtype to cast entire Index object to.

copybool, default False

By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.

Returns
Index

Index with values cast to specified dtype.

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3])
>>> index
Int64Index([1, 2, 3], dtype='int64')
>>> index.astype('float64')
Float64Index([1.0, 2.0, 3.0], dtype='float64')
atan()

Get Trigonometric inverse tangent, element-wise.

The inverse of tan so that, if y = x.tan(), then x = y.atan()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10])
>>> ser
0    -1.00000
1     0.00000
2     1.00000
3     0.32434
4     0.50000
5   -10.00000
dtype: float64
>>> ser.atan()
0   -0.785398
1    0.000000
2    0.785398
3    0.313635
4    0.463648
5   -1.471128
dtype: float64

atan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.atan()
      first    second
0 -0.785398  0.229864
1 -1.471128  0.291457
2  0.463648  1.471128

atan operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.atan()
Float64Index([-0.7853981633974483,  0.3805063771123649,
                            0.7853981633974483, 0.0,
                            0.2914567944778671],
            dtype='float64')
clip(lower=None, upper=None, inplace=False, axis=1)

Trim values at input threshold(s).

Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.

Parameters
lowerscalar or array_like, default None

Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.

upperscalar or array_like, default None

Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.

inplacebool, default False
Returns
Clipped DataFrame/Series/Index/MultiIndex

Examples

>>> import cudf
>>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']})
>>> df.clip(lower=[2, 'b'], upper=[3, 'c'])
   a  b
0  2  b
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=None, upper=[3, 'c'])
   a  b
0  1  a
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=[2, 'b'], upper=None)
   a  b
0  2  b
1  2  b
2  3  c
3  4  d
>>> df.clip(lower=2, upper=3, inplace=True)
>>> df
   a  b
0  2  2
1  2  3
2  3  3
3  3  3
>>> import cudf
>>> sr = cudf.Series([1, 2, 3, 4])
>>> sr.clip(lower=2, upper=3)
0    2
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=None, upper=3)
0    1
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True)
>>> sr
0    2
1    2
2    3
3    4
dtype: int64
copy(name=None, deep=False, dtype=None, names=None)

Make a copy of this object.

Parameters
nameobject, default None

Name of index, use original name when None

deepbool, default True

Make a deep copy of the data. With deep=False the original data is used

dtypenumpy dtype, default None

Target datatype to cast into, use original dtype when None

nameslist-like, default False

Kept compatibility with MultiIndex. Should not be used.

Returns
New index instance, casted to new dtype
cos()

Get Trigonometric cosine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.cos()
0    1.000000
1    0.947861
2    0.877583
3    0.525322
4   -0.448074
5   -0.598460
6   -0.283691
dtype: float64

cos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.cos()
      first    second
0  1.000000  0.862319
1  0.283662 -0.283691
2 -0.839072 -0.839039
3 -0.759688 -0.022097

cos operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.cos()
Float64Index([ 0.9210609940028851,  0.8623188722876839,
            -0.5984600690578581, -0.4480736161291701],
            dtype='float64')
difference(other, sort=None)

Return a new Index with elements from the index that are not in other.

This is the set difference of two Index objects.

Parameters
otherIndex or array-like
sortFalse or None, default None

Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.

  • None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.

  • False : Do not sort the result.

Returns
differenceIndex

Examples

>>> import cudf
>>> idx1 = cudf.Index([2, 1, 3, 4])
>>> idx1
Int64Index([2, 1, 3, 4], dtype='int64')
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx2
Int64Index([3, 4, 5, 6], dtype='int64')
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
drop_duplicates(keep='first')

Return Index with duplicate values removed

Parameters
keep{‘first’, ‘last’, False}, default ‘first’
  • ‘first’Drop duplicates except for the

    first occurrence.

  • ‘last’Drop duplicates except for the

    last occurrence.

  • False : Drop all duplicates.

Returns
deduplicatedIndex

Examples

>>> import cudf
>>> idx = cudf.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
>>> idx
StringIndex(['lama' 'cow' 'lama' 'beetle' 'lama' 'hippo'], dtype='object')
>>> idx.drop_duplicates()
StringIndex(['beetle' 'cow' 'hippo' 'lama'], dtype='object')
dropna(how='any')

Return an Index with null values removed.

Parameters
how{‘any’, ‘all’}, default ‘any’

If the Index is a MultiIndex, drop the value when any or all levels are NaN.

Returns
validIndex

Examples

>>> import cudf
>>> index = cudf.Index(['a', None, 'b', 'c'])
>>> index
StringIndex(['a' None 'b' 'c'], dtype='object')
>>> index.dropna()
StringIndex(['a' 'b' 'c'], dtype='object')

Using dropna on a MultiIndex:

>>> midx = cudf.MultiIndex(
...         levels=[[1, None, 4, None], [1, 2, 5]],
...         codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...         names=["x", "y"],
...     )
>>> midx
MultiIndex([(   1, 1),
            (   1, 5),
            (<NA>, 2),
            (   4, 2),
            (<NA>, 1)],
           names=['x', 'y'])
>>> midx.dropna()
MultiIndex([(1, 1),
            (1, 5),
            (4, 2)],
           names=['x', 'y'])
property dtype

dtype of the underlying values in GenericIndex.

property empty

Indicator whether Index is empty.

True if Index is entirely empty (no items).

Returns
outbool

If Index is empty, return True, if not return False.

Examples

>>> import cudf
>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.empty
True
equals(other, **kwargs)

Determine if two Index objects contain the same elements.

Returns
out: bool

True if “other” is an Index and it has the same elements as calling index; False otherwise.

exp()

Get the exponential of all elements, element-wise.

Exponential is the inverse of the log function, so that x.exp().log() = x

Returns
DataFrame/Series/Index

Result of the element-wise exponential.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.exp()
0    3.678794e-01
1    1.000000e+00
2    2.718282e+00
3    1.383117e+00
4    1.648721e+00
5    4.539993e-05
6    2.688117e+43
dtype: float64

exp operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.exp()
      first        second
0  0.367879      1.263644
1  0.000045      1.349859
2  1.648721  22026.465795

exp operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.exp()
Float64Index([0.36787944117144233,  1.4918246976412703,
              2.718281828459045, 1.0,  1.3498588075760032],
            dtype='float64')
factorize(na_sentinel=- 1)

Encode the input values as integer labels

See also

cudf.core.series.Series.factorize

Encode the input values of Series.

fillna(value, downcast=None)

Fill null values with the specified value.

Parameters
valuescalar

Scalar value to use to fill nulls. This value cannot be a list-likes.

downcastdict, default is None

This Parameter is currently NON-FUNCTIONAL.

Returns
filledIndex

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, None, 4])
>>> index
Int64Index([1, 2, null, 4], dtype='int64')
>>> index.fillna(3)
Int64Index([1, 2, 3, 4], dtype='int64')
find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

Returns
begin, end2-tuple of int

The starting index and the ending index. The last value occurs at end - 1 position.

classmethod from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

Parameters
arrayPyArrow Array/ChunkedArray

PyArrow Object which has to be converted to Index

Returns
cudf Index
Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pyarrow as pa
>>> cudf.Index.from_arrow(pa.array(["a", "b", None]))
StringIndex(['a' 'b' None], dtype='object')
classmethod from_pandas(index, nan_as_null=None)

Convert from a Pandas Index.

Parameters
indexPandas Index object

A Pandas Index object which has to be converted to cuDF Index.

nan_as_nullbool, Default None

If None/True, converts np.nan values to null values. If False, leaves np.nan values as is.

Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pandas as pd
>>> import numpy as np
>>> data = [10, 20, 30, np.nan]
>>> pdi = pd.Index(data)
>>> cudf.core.index.Index.from_pandas(pdi)
Float64Index([10.0, 20.0, 30.0, <NA>], dtype='float64')
>>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False)
Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
get_level_values(level)

Return an Index of values for requested level.

This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.

Parameters
levelint or str

It is either the integer position or the name of the level.

Returns
Index

Calling object, as there is only one level in the Index.

See also

cudf.core.multiindex.MultiIndex.get_level_values

Get values for a level of a MultiIndex.

Notes

For Index, level should be 0, since there are no multiple levels.

Examples

>>> import cudf
>>> idx = cudf.Index(["a", "b", "c"])
>>> idx.get_level_values(0)
StringIndex(['a' 'b' 'c'], dtype='object')
get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if side=='right') position of given label.

Parameters
labelobject
side{‘left’, ‘right’}
kind{‘ix’, ‘loc’, ‘getitem’}
Returns
int

Index of label.

property gpu_values

View the data as a numba device array object

interleave_columns()

Interleave Series columns of a table into a single column.

Converts the column major table cols into a row major column.

Parameters
colsinput Table containing columns to interleave.
Returns
The interleaved columns as a single column

Examples

>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']])
>>> df
0    [A1, A2, A3]
1    [B1, B2, B3]
>>> df.interleave_columns()
0    A1
1    B1
2    A2
3    B2
4    A3
5    B3
property is_monotonic

Alias for is_monotonic_increasing.

property is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

property is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

property is_unique

Return if the index has unique values.

isin(values)

Return a boolean array where the index values are in values.

Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.

Parameters
valuesset, list-like, Index

Sought values.

Returns
is_containedcupy array

CuPy array of boolean values.

Examples

>>> idx = cudf.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')

Check whether each index value in a list of values.

>>> idx.isin([1, 4])
array([ True, False, False])
isna()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
isnull()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
join(other, how='left', level=None, return_indexers=False, sort=False)

Compute join_index and indexers to conform data structures to the new index.

Parameters
otherIndex.
how{‘left’, ‘right’, ‘inner’, ‘outer’}
return_indexersbool, default False
sortbool, default False

Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).

Returns: index

Examples

>>> import cudf
>>> lhs = cudf.DataFrame(
...     {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b']
... ).index
>>> lhs
MultiIndex([(2, 3),
            (3, 4),
            (1, 2)],
           names=['a', 'b'])
>>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index
>>> rhs
Int64Index([1, 4, 3], dtype='int64', name='a')
>>> lhs.join(rhs, how='inner')
MultiIndex([(3, 4),
            (1, 2)],
           names=['a', 'b'])
log()

Get the natural logarithm of all elements, element-wise.

Natural logarithm is the inverse of the exp function, so that x.log().exp() = x

Returns
DataFrame/Series/Index

Result of the element-wise natural logarithm.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.log()
0         NaN
1        -inf
2    0.000000
3   -1.125963
4   -0.693147
5         NaN
6    4.605170
dtype: float64

log operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.log()
      first    second
0       NaN -1.452434
1       NaN -1.203973
2 -0.693147  2.302585

log operation on Index:

>>> index = cudf.Index([10, 11, 500.0])
>>> index
Float64Index([10.0, 11.0, 500.0], dtype='float64')
>>> index.log()
Float64Index([2.302585092994046, 2.3978952727983707,
            6.214608098422191], dtype='float64')
mask(cond, other=None, inplace=False)

Replace values where the condition is True.

Parameters
condbool Series/DataFrame, array-like

Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.

other: scalar, list of scalars, Series/DataFrame

Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.

DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.

Series expects only scalar or series like with same length

inplacebool, default False

Whether to perform the operation in place on the data.

Returns
Same type as caller

Examples

>>> import cudf
>>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]})
>>> df.mask(df % 2 == 0, [-1, -1])
   A  B
0  1  3
1 -1  5
2  5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.mask(ser > 2, 10)
0    10
1    10
2     2
3     1
4     0
dtype: int64
>>> ser.mask(ser > 2)
0    <NA>
1    <NA>
2       2
3       1
4       0
dtype: int64
max()

Return the maximum value of the Index.

Returns
scalar

Maximum value.

See also

cudf.core.index.Index.min

Return the minimum value in an Index.

cudf.core.series.Series.max

Return the maximum value in a Series.

cudf.core.dataframe.DataFrame.max

Return the maximum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.max()
3
memory_usage(deep=False)

Memory usage of the values.

Parameters
deepbool

Introspect the data deeply, interrogate object dtypes for system-level memory consumption.

Returns
bytes used
min()

Return the minimum value of the Index.

Returns
scalar

Minimum value.

See also

cudf.core.index.Index.max

Return the maximum value in an Index.

cudf.core.series.Series.min

Return the minimum value in a Series.

cudf.core.dataframe.DataFrame.min

Return the minimum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.min()
1
property name

Returns the name of the Index.

property names

Returns a tuple containing the name of the Index.

property ndim

Dimension of the data. Apart from MultiIndex ndim is always 1.

property nlevels

Number of levels.

notna()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
notnull()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

Parameters
funcfunction

Function to apply to the Series/DataFrame/Index. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame/Index.

argsiterable, optional

Positional arguments passed into func.

kwargsmapping, optional

A dictionary of keyword arguments passed into func.

Returns
objectthe return type of func.

Examples

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. Instead of writing

>>> func(g(h(df), arg1=a), arg2=b, arg3=c)

You can write

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe(func, arg2=b, arg3=c)
... )

If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose f takes its data as arg2:

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe((func, 'arg2'), arg1=a, arg3=c)
...  )
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)

Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.

Parameters
axis{0 or ‘index’, 1 or ‘columns’}, default 0

Index to direct ranking.

method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’

How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.

numeric_onlybool, optional

For DataFrame objects, rank only numeric columns if set to True.

na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’

How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.

ascendingbool, default True

Whether or not the elements should be ranked in ascending order.

pctbool, default False

Whether or not to display the returned rankings in percentile form.

Returns
same type as caller

Return a Series or DataFrame with data ranks as values.

rename(name, inplace=False)

Alter Index name.

Defaults to returning new index.

Parameters
namelabel

Name(s) to set.

Returns
Index

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3], name='one')
>>> index
Int64Index([1, 2, 3], dtype='int64', name='one')
>>> index.name
'one'
>>> renamed_index = index.rename('two')
>>> renamed_index
Int64Index([1, 2, 3], dtype='int64', name='two')
>>> renamed_index.name
'two'
repeat(repeats, axis=None)

Repeats elements consecutively.

Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.

Parameters
repeatsint, or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.

Returns
Series/DataFrame/Index

A newly created object of same type as caller with repeated elements.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]})
>>> df
   a   b
0  1  10
1  2  20
2  3  30
>>> df.repeat(3)
   a   b
0  1  10
0  1  10
0  1  10
1  2  20
1  2  20
1  2  20
2  3  30
2  3  30
2  3  30

Repeat on Series

>>> s = cudf.Series([0, 2])
>>> s
0    0
1    2
dtype: int64
>>> s.repeat([3, 4])
0    0
0    0
0    0
1    2
1    2
1    2
1    2
dtype: int64
>>> s.repeat(2)
0    0
0    0
1    2
1    2
dtype: int64

Repeat on Index

>>> index = cudf.Index([10, 22, 33, 55])
>>> index
Int64Index([10, 22, 33, 55], dtype='int64')
>>> index.repeat(5)
Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33,
            33, 33, 33, 33, 55, 55, 55, 55, 55],
        dtype='int64')
round(decimals=0)

Round a DataFrame to a variable number of decimal places.

Parameters
decimalsint, dict, Series

Number of decimal places to round each column to. If an int is given, round each column to the same number of places. Otherwise dict and Series round to variable numbers of places. Column names should be in the keys if decimals is a dict-like, or in the index if decimals is a Series. Any columns not included in decimals will be left as is. Elements of decimals which are not columns of the input will be ignored.

Returns
DataFrame

A DataFrame with the affected columns rounded to the specified number of decimal places.

Examples

>>> df = cudf.DataFrame(
        [(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
...     columns=['dogs', 'cats']
... )
>>> df
    dogs  cats
0  0.21  0.32
1  0.01  0.67
2  0.66  0.03
3  0.21  0.18

By providing an integer each column is rounded to the same number of decimal places

>>> df.round(1)
    dogs  cats
0   0.2   0.3
1   0.0   0.7
2   0.7   0.0
3   0.2   0.2

With a dict, the number of places for specific columns can be specified with the column names as key and the number of decimal places as value

>>> df.round({'dogs': 1, 'cats': 0})
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

Using a Series, the number of places for specific columns can be specified with the column names as index and the number of decimal places as value

>>> decimals = cudf.Series([0, 1], index=['cats', 'dogs'])
>>> df.round(decimals)
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters
nint, optional

Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

fracfloat, optional

Fraction of axis items to return. Cannot be used with n.

replacebool, default False

Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”

weightsstr or ndarray-like, optional

Only supported for axis=1/”columns”

random_stateint, numpy RandomState or None, default None

Seed for the random number generator (if int), or None. If None, a random seed will be chosen. if RandomState, seed will be extracted from current state.

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.

Returns
Series or DataFrame or Index

A new object of same type as caller containing n items randomly sampled from the caller object.

Examples

>>> import cudf as cudf
>>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}})
>>> df.sample(3)
   a
1  2
3  4
0  1
>>> sr = cudf.Series([1, 2, 3, 4, 5])
>>> sr.sample(10, replace=True)
1    4
3    1
2    4
0    5
0    1
4    5
4    1
0    2
0    3
3    2
dtype: int64
>>> df = cudf.DataFrame(
... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]})
>>> df.sample(2, axis=1)
   a  c
0  1  3
1  2  4
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)

Scatter to a list of dataframes.

Uses map_index to determine the destination of each row of the original DataFrame.

Parameters
map_indexSeries, str or list-like

Scatter assignment for each row

map_sizeint

Length of output list. Must be >= uniques in map_index

keep_indexbool

Conserve original index values for each row

Returns
A list of cudf.DataFrame objects.
searchsorted(values, side='left', ascending=True, na_position='last')

Find indices where elements should be inserted to maintain order

Parameters
valueFrame (Shape must be consistent with self)

Values to be hypothetically inserted into Self

sidestr {‘left’, ‘right’} optional, default ‘left‘

If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index

ascendingbool optional, default True

Sorted Frame is in ascending order (otherwise descending)

na_positionstr {‘last’, ‘first’} optional, default ‘last‘

Position of null values in sorted order

Returns
1-D cupy array of insertion points

Examples

>>> s = cudf.Series([1, 2, 3])
>>> s.searchsorted(4)
3
>>> s.searchsorted([0, 4])
array([0, 3], dtype=int32)
>>> s.searchsorted([1, 3], side='left')
array([0, 2], dtype=int32)
>>> s.searchsorted([1, 3], side='right')
array([1, 3], dtype=int32)

If the values are not monotonically sorted, wrong locations may be returned:

>>> s = cudf.Series([2, 1, 3])
>>> s.searchsorted(1)
0   # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]})
>>> df
   a   b
0  1  10
1  3  12
2  5  14
3  7  16
>>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6],
... 'b': [10, 11, 13, 15]})
>>> values_df
   a   b
0  0  10
1  2  17
2  5  13
3  6  15
>>> df.searchsorted(values_df, ascending=False)
array([4, 4, 4, 0], dtype=int32)
set_names(names, level=None, inplace=False)

Set Index or MultiIndex name. Able to set new names partially and by level.

Parameters
nameslabel or list of label

Name(s) to set.

levelint, label or list of int or label, optional

If the index is a MultiIndex, level(s) to set (None for all levels). Otherwise level must be None.

inplacebool, default False

Modifies the object directly, instead of creating a new Index or MultiIndex.

Returns
Index

The same type as the caller or None if inplace is True.

See also

cudf.core.index.Index.rename

Able to set new names without level.

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = cudf.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
            ('python', 2019),
            ( 'cobra', 2018),
            ( 'cobra', 2019)],
           )
>>> idx.names
FrozenList([None, None])
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx.names
FrozenList(['kind', 'year'])
>>> idx.set_names('species', level=0, inplace=True)
>>> idx.names
FrozenList(['species', 'year'])
property shape

Returns a tuple representing the dimensionality of the Index.

shift(periods=1, freq=None, axis=0, fill_value=None)

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.sin()
0    0.000000
1    0.318683
2    0.479426
3    0.850904
4    0.893997
5   -0.801153
6    0.958916
dtype: float64

sin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.sin()
      first    second
0  0.000000 -0.506366
1 -0.958924  0.958916
2 -0.544021 -0.544072
3  0.650288 -0.999756

sin operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.sin()
Float64Index([-0.3894183423086505, -0.5063656411097588,
            0.8011526357338306, 0.8939966636005579],
            dtype='float64')
property size

Return the number of elements in the underlying data.

Returns
sizeSize of the DataFrame / Index / Series / MultiIndex

Examples

Size of an empty dataframe is 0.

>>> import cudf
>>> df = cudf.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>> df.size
0
>>> df = cudf.DataFrame(index=[1, 2, 3])
>>> df
Empty DataFrame
Columns: []
Index: [1, 2, 3]
>>> df.size
0

DataFrame with values

>>> df = cudf.DataFrame({'a': [10, 11, 12],
...         'b': ['hello', 'rapids', 'ai']})
>>> df
    a       b
0  10   hello
1  11  rapids
2  12      ai
>>> df.size
6
>>> df.index
RangeIndex(start=0, stop=3)
>>> df.index.size
3

Size of an Index

>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.size
0
>>> index = cudf.Index([1, 2, 3, 10])
>>> index
Int64Index([1, 2, 3, 10], dtype='int64')
>>> index.size
4

Size of a MultiIndex

>>> midx = cudf.MultiIndex(
...                 levels=[["a", "b", "c", None], ["1", None, "5"]],
...                 codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...                 names=["x", "y"],
...             )
>>> midx
MultiIndex([( 'a',  '1'),
            ( 'a',  '5'),
            ( 'b', <NA>),
            ( 'c', <NA>),
            (<NA>,  '1')],
           names=['x', 'y'])
>>> midx.size
5
sort_values(return_indexer=False, ascending=True, key=None)

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

Parameters
return_indexerbool, default False

Should the indices that would sort the index be returned.

ascendingbool, default True

Should the index values be sorted in an ascending order.

keyNone, optional

This parameter is NON-FUNCTIONAL.

Returns
sorted_indexIndex

Sorted copy of the index.

indexercupy.ndarray, optional

The indices that the index itself was sorted by.

See also

cudf.core.series.Series.min

Sort values of a Series.

cudf.core.dataframe.DataFrame.sort_values

Sort values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')

Sort values in ascending order (default behavior).

>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')

Sort values in descending order, and also get the indices idx was sorted by.

>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2],
                                                    dtype=int32))

Sorting values in a MultiIndex:

>>> midx = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> midx
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> midx.sort_values()
MultiIndex([(-10,  1),
            (  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11)],
           names=['x', 'y'])
>>> midx.sort_values(ascending=False)
MultiIndex([(  4, 11),
            (  3, 11),
            (  1,  5),
            (  1,  1),
            (-10,  1)],
           names=['x', 'y'])
sqrt()

Get the non-negative square-root of all elements, element-wise.

Returns
DataFrame/Series/Index

Result of the non-negative square-root of each element.

Examples

>>> import cudf
>>> import cudf
>>> ser = cudf.Series([10, 25, 81, 1.0, 100])
>>> ser
0     10.0
1     25.0
2     81.0
3      1.0
4    100.0
dtype: float64
>>> ser.sqrt()
0     3.162278
1     5.000000
2     9.000000
3     1.000000
4    10.000000
dtype: float64

sqrt operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-10.0, 100, 625],
...                      'second': [1, 2, 0.4]})
>>> df
   first  second
0  -10.0     1.0
1  100.0     2.0
2  625.0     0.4
>>> df.sqrt()
   first    second
0    NaN  1.000000
1   10.0  1.414214
2   25.0  0.632456

sqrt operation on Index:

>>> index = cudf.Index([-10.0, 100, 625])
>>> index
Float64Index([-10.0, 100.0, 625.0], dtype='float64')
>>> index.sqrt()
Float64Index([nan, 10.0, 25.0], dtype='float64')
sum()

Return the sum of all values of the Index.

Returns
scalar

Sum of all values.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.sum()
6
take(indices)

Gather only the specific subset of indices

Parameters
indices: An array-like that maps to values contained in this Index.
tan()

Get Trigonometric tangent, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.tan()
0    0.000000
1    0.336213
2    0.546302
3    1.619775
4   -1.995200
5    1.338690
6   -3.380140
dtype: float64

tan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.tan()
      first     second
0  0.000000  -0.587214
1 -3.380515  -3.380140
2  0.648361   0.648446
3 -0.855993  45.244742

tan operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.tan()
Float64Index([-0.4227932187381618,  -0.587213915156929,
            -1.3386902103511544, -1.995200412208242],
            dtype='float64')
tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

Parameters
selfinput Table containing columns to interleave.
countNumber of times to tile “rows”. Must be non-negative.
Returns
The table containing the tiled “rows”.

Examples

>>> df  = Dataframe([[8, 4, 7], [5, 2, 3]])
>>> count = 2
>>> df.tile(df, count)
   0  1  2
0  8  4  7
1  5  2  3
0  8  4  7
1  5  2  3
to_array(fillna=None)

Get a dense numpy array for the data.

Parameters
fillnastr or None

Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.

Notes

if fillna is None, null values are skipped. Therefore, the output size could be smaller.

to_arrow()

Convert Index to PyArrow Array

Returns
PyArrow Array

Examples

>>> import cudf
>>> ind = cudf.Index(["a", "b", None])
>>> ind.to_arrow()
<pyarrow.lib.StringArray object at 0x7f796b0e7750>
[
  "a",
  "b",
  null
]
to_dlpack()

Converts a cuDF object into a DLPack tensor.

DLPack is an open-source memory tensor structure: dmlc/dlpack.

This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.

Parameters
cudf_objDataFrame, Series, Index, or Column
Returns
pycapsule_objPyCapsule

Output DLPack tensor pointer which is encapsulated in a PyCapsule object.

to_frame(index=True, name=None)

Create a DataFrame with a column containing this Index

Parameters
indexboolean, default True

Set the index of the returned DataFrame as the original Index

namestr, default None

Name to be used for the column

Returns
DataFrame

cudf DataFrame

to_pandas()

Convert to a Pandas Index.

Examples

>>> import cudf
>>> idx = cudf.Index([-3, 10, 15, 20])
>>> idx
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> idx.to_pandas()
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> type(idx.to_pandas())
<class 'pandas.core.indexes.numeric.Int64Index'>
>>> type(idx)
<class 'cudf.core.index.GenericIndex'>
to_series(index=None, name=None)

Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.

Parameters
indexIndex, optional

Index of resulting Series. If None, defaults to original index.

namestr, optional

Dame of resulting Series. If None, defaults to name of original index.

Returns
Series

The dtype will be based on the type of the Index values.

unique()

Return unique values in the index.

Returns
Index without duplicates
property values

Return an array representing the data in the Index.

Returns
arrayA cupy array of data in the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values
array([  1, -10, 100,  20])
>>> type(index.values)
<class 'cupy.core.core.ndarray'>
property values_host

Return a numpy representation of the Index.

Only the values in the Index will be returned.

Returns
outnumpy.ndarray

The values of the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values_host
array([  1, -10, 100,  20])
>>> type(index.values_host)
<class 'numpy.ndarray'>
where(cond, other=None)

Replace values where the condition is False.

Parameters
condbool array-like with the same length as self

Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.

other: scalar, or array-like

Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.

Returns
Same type as caller

Examples

>>> import cudf
>>> index = cudf.Index([4, 3, 2, 1, 0])
>>> index
Int64Index([4, 3, 2, 1, 0], dtype='int64')
>>> index.where(index > 2, 15)
Int64Index([4, 3, 15, 15, 15], dtype='int64')

UInt8Index

class cudf.core.index.UInt8Index(data=None, dtype=None, copy=False, name=None)

Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.

Parameters
dataarray-like (1-dimensional)/ DataFrame

If it is a DataFrame, it will return a MultiIndex

dtypeNumPy dtype (default: object)

If dtype is None, we find the dtype that best fits the data.

copybool

Make a copy of input data.

nameobject

Name to be stored in the index.

tupleize_colsbool (default: True)

When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.

Returns
Index

cudf Index

Examples

>>> import cudf
>>> cudf.Index([1, 2, 3], dtype="uint64", name="a")
UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]}))
MultiIndex([(1, 2),
            (2, 3)],
          names=['a', 'b'])
Attributes
dtype

dtype of the underlying values in GenericIndex.

empty

Indicator whether Index is empty.

gpu_values

View the data as a numba device array object

is_monotonic

Alias for is_monotonic_increasing.

is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

is_unique

Return if the index has unique values.

name

Returns the name of the Index.

names

Returns a tuple containing the name of the Index.

ndim

Dimension of the data.

nlevels

Number of levels.

shape

Returns a tuple representing the dimensionality of the Index.

size

Return the number of elements in the underlying data.

values

Return an array representing the data in the Index.

values_host

Return a numpy representation of the Index.

Methods

acos()

Get Trigonometric inverse cosine, element-wise.

any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

argsort([ascending])

Return the integer indices that would sort the index.

asin()

Get Trigonometric inverse sine, element-wise.

astype(dtype[, copy])

Create an Index with values cast to dtypes.

atan()

Get Trigonometric inverse tangent, element-wise.

clip([lower, upper, inplace, axis])

Trim values at input threshold(s).

copy([name, deep, dtype, names])

Make a copy of this object.

cos()

Get Trigonometric cosine, element-wise.

difference(other[, sort])

Return a new Index with elements from the index that are not in other.

drop_duplicates([keep])

Return Index with duplicate values removed

dropna([how])

Return an Index with null values removed.

equals(other, **kwargs)

Determine if two Index objects contain the same elements.

exp()

Get the exponential of all elements, element-wise.

factorize([na_sentinel])

Encode the input values as integer labels

fillna(value[, downcast])

Fill null values with the specified value.

find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

from_pandas(index[, nan_as_null])

Convert from a Pandas Index.

get_level_values(level)

Return an Index of values for requested level.

get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label.

interleave_columns()

Interleave Series columns of a table into a single column.

isin(values)

Return a boolean array where the index values are in values.

isna()

Identify missing values.

isnull()

Identify missing values.

join(other[, how, level, return_indexers, sort])

Compute join_index and indexers to conform data structures to the new index.

log()

Get the natural logarithm of all elements, element-wise.

mask(cond[, other, inplace])

Replace values where the condition is True.

max()

Return the maximum value of the Index.

memory_usage([deep])

Memory usage of the values.

min()

Return the minimum value of the Index.

notna()

Identify non-missing values.

notnull()

Identify non-missing values.

pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

rank([axis, method, numeric_only, …])

Compute numerical data ranks (1 through n) along axis.

rename(name[, inplace])

Alter Index name.

repeat(repeats[, axis])

Repeats elements consecutively.

round([decimals])

Round a DataFrame to a variable number of decimal places.

sample([n, frac, replace, weights, …])

Return a random sample of items from an axis of object.

scatter_by_map(map_index[, map_size, keep_index])

Scatter to a list of dataframes.

searchsorted(values[, side, ascending, …])

Find indices where elements should be inserted to maintain order

set_names(names[, level, inplace])

Set Index or MultiIndex name.

shift([periods, freq, axis, fill_value])

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

sort_values([return_indexer, ascending, key])

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

sqrt()

Get the non-negative square-root of all elements, element-wise.

sum()

Return the sum of all values of the Index.

take(indices)

Gather only the specific subset of indices

tan()

Get Trigonometric tangent, element-wise.

tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

to_array([fillna])

Get a dense numpy array for the data.

to_arrow()

Convert Index to PyArrow Array

to_dlpack()

Converts a cuDF object into a DLPack tensor.

to_frame([index, name])

Create a DataFrame with a column containing this Index

to_pandas()

Convert to a Pandas Index.

to_series([index, name])

Create a Series with both index and values equal to the index keys.

unique()

Return unique values in the index.

where(cond[, other])

Replace values where the condition is False.

replace

acos()

Get Trigonometric inverse cosine, element-wise.

The inverse of cos so that, if y = x.cos(), then x = y.acos()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.acos()
0    3.141593
1    1.570796
2    0.000000
3    1.240482
4    1.047198
dtype: float64

acos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.acos()
      first    second
0  3.141593  1.334606
1  1.570796  1.266104
2  1.047198  1.470629

acos operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.acos()
Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0,
            1.5707963267948966,  1.266103672779499],
            dtype='float64')
any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

Parameters
otherIndex or list/tuple of indices
Returns
appendedIndex

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 10, 100])
>>> idx
Int64Index([1, 2, 10, 100], dtype='int64')
>>> other = cudf.Index([200, 400, 50])
>>> other
Int64Index([200, 400, 50], dtype='int64')
>>> idx.append(other)
Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')

append accepts list of Index objects

>>> idx.append([other, other])
Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
argsort(ascending=True, **kwargs)

Return the integer indices that would sort the index.

Parameters
ascendingbool, default True

If True, returns the indices for ascending order. If False, returns the indices for descending order.

Returns
arrayA cupy array containing Integer indices that

would sort the index if used as an indexer.

Examples

>>> import cudf
>>> index = cudf.Index([10, 100, 1, 1000])
>>> index
Int64Index([10, 100, 1, 1000], dtype='int64')
>>> index.argsort()
array([2, 0, 1, 3], dtype=int32)

The order of argsort can be reversed using ascending parameter, by setting it to False. >>> index.argsort(ascending=False) array([3, 1, 0, 2], dtype=int32)

argsort on a MultiIndex:

>>> index = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> index
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> index.argsort()
array([4, 0, 1, 2, 3], dtype=int32)
>>> index.argsort(ascending=False)
array([3, 2, 1, 0, 4], dtype=int32)
asin()

Get Trigonometric inverse sine, element-wise.

The inverse of sine so that, if y = x.sin(), then x = y.asin()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.asin()
0   -1.570796
1    0.000000
2    1.570796
3    0.330314
4    0.523599
dtype: float64

asin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.asin()
      first    second
0 -1.570796  0.236190
1  0.000000  0.304693
2  0.523599  0.100167

asin operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64')
>>> index.asin()
Float64Index([-1.5707963267948966, 0.41151684606748806,
            1.5707963267948966, 0.3046926540153975],
            dtype='float64')
astype(dtype, copy=False)

Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.

Parameters
dtypenumpy dtype

Use a numpy.dtype to cast entire Index object to.

copybool, default False

By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.

Returns
Index

Index with values cast to specified dtype.

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3])
>>> index
Int64Index([1, 2, 3], dtype='int64')
>>> index.astype('float64')
Float64Index([1.0, 2.0, 3.0], dtype='float64')
atan()

Get Trigonometric inverse tangent, element-wise.

The inverse of tan so that, if y = x.tan(), then x = y.atan()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10])
>>> ser
0    -1.00000
1     0.00000
2     1.00000
3     0.32434
4     0.50000
5   -10.00000
dtype: float64
>>> ser.atan()
0   -0.785398
1    0.000000
2    0.785398
3    0.313635
4    0.463648
5   -1.471128
dtype: float64

atan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.atan()
      first    second
0 -0.785398  0.229864
1 -1.471128  0.291457
2  0.463648  1.471128

atan operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.atan()
Float64Index([-0.7853981633974483,  0.3805063771123649,
                            0.7853981633974483, 0.0,
                            0.2914567944778671],
            dtype='float64')
clip(lower=None, upper=None, inplace=False, axis=1)

Trim values at input threshold(s).

Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.

Parameters
lowerscalar or array_like, default None

Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.

upperscalar or array_like, default None

Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.

inplacebool, default False
Returns
Clipped DataFrame/Series/Index/MultiIndex

Examples

>>> import cudf
>>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']})
>>> df.clip(lower=[2, 'b'], upper=[3, 'c'])
   a  b
0  2  b
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=None, upper=[3, 'c'])
   a  b
0  1  a
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=[2, 'b'], upper=None)
   a  b
0  2  b
1  2  b
2  3  c
3  4  d
>>> df.clip(lower=2, upper=3, inplace=True)
>>> df
   a  b
0  2  2
1  2  3
2  3  3
3  3  3
>>> import cudf
>>> sr = cudf.Series([1, 2, 3, 4])
>>> sr.clip(lower=2, upper=3)
0    2
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=None, upper=3)
0    1
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True)
>>> sr
0    2
1    2
2    3
3    4
dtype: int64
copy(name=None, deep=False, dtype=None, names=None)

Make a copy of this object.

Parameters
nameobject, default None

Name of index, use original name when None

deepbool, default True

Make a deep copy of the data. With deep=False the original data is used

dtypenumpy dtype, default None

Target datatype to cast into, use original dtype when None

nameslist-like, default False

Kept compatibility with MultiIndex. Should not be used.

Returns
New index instance, casted to new dtype
cos()

Get Trigonometric cosine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.cos()
0    1.000000
1    0.947861
2    0.877583
3    0.525322
4   -0.448074
5   -0.598460
6   -0.283691
dtype: float64

cos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.cos()
      first    second
0  1.000000  0.862319
1  0.283662 -0.283691
2 -0.839072 -0.839039
3 -0.759688 -0.022097

cos operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.cos()
Float64Index([ 0.9210609940028851,  0.8623188722876839,
            -0.5984600690578581, -0.4480736161291701],
            dtype='float64')
difference(other, sort=None)

Return a new Index with elements from the index that are not in other.

This is the set difference of two Index objects.

Parameters
otherIndex or array-like
sortFalse or None, default None

Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.

  • None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.

  • False : Do not sort the result.

Returns
differenceIndex

Examples

>>> import cudf
>>> idx1 = cudf.Index([2, 1, 3, 4])
>>> idx1
Int64Index([2, 1, 3, 4], dtype='int64')
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx2
Int64Index([3, 4, 5, 6], dtype='int64')
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
drop_duplicates(keep='first')

Return Index with duplicate values removed

Parameters
keep{‘first’, ‘last’, False}, default ‘first’
  • ‘first’Drop duplicates except for the

    first occurrence.

  • ‘last’Drop duplicates except for the

    last occurrence.

  • False : Drop all duplicates.

Returns
deduplicatedIndex

Examples

>>> import cudf
>>> idx = cudf.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
>>> idx
StringIndex(['lama' 'cow' 'lama' 'beetle' 'lama' 'hippo'], dtype='object')
>>> idx.drop_duplicates()
StringIndex(['beetle' 'cow' 'hippo' 'lama'], dtype='object')
dropna(how='any')

Return an Index with null values removed.

Parameters
how{‘any’, ‘all’}, default ‘any’

If the Index is a MultiIndex, drop the value when any or all levels are NaN.

Returns
validIndex

Examples

>>> import cudf
>>> index = cudf.Index(['a', None, 'b', 'c'])
>>> index
StringIndex(['a' None 'b' 'c'], dtype='object')
>>> index.dropna()
StringIndex(['a' 'b' 'c'], dtype='object')

Using dropna on a MultiIndex:

>>> midx = cudf.MultiIndex(
...         levels=[[1, None, 4, None], [1, 2, 5]],
...         codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...         names=["x", "y"],
...     )
>>> midx
MultiIndex([(   1, 1),
            (   1, 5),
            (<NA>, 2),
            (   4, 2),
            (<NA>, 1)],
           names=['x', 'y'])
>>> midx.dropna()
MultiIndex([(1, 1),
            (1, 5),
            (4, 2)],
           names=['x', 'y'])
property dtype

dtype of the underlying values in GenericIndex.

property empty

Indicator whether Index is empty.

True if Index is entirely empty (no items).

Returns
outbool

If Index is empty, return True, if not return False.

Examples

>>> import cudf
>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.empty
True
equals(other, **kwargs)

Determine if two Index objects contain the same elements.

Returns
out: bool

True if “other” is an Index and it has the same elements as calling index; False otherwise.

exp()

Get the exponential of all elements, element-wise.

Exponential is the inverse of the log function, so that x.exp().log() = x

Returns
DataFrame/Series/Index

Result of the element-wise exponential.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.exp()
0    3.678794e-01
1    1.000000e+00
2    2.718282e+00
3    1.383117e+00
4    1.648721e+00
5    4.539993e-05
6    2.688117e+43
dtype: float64

exp operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.exp()
      first        second
0  0.367879      1.263644
1  0.000045      1.349859
2  1.648721  22026.465795

exp operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.exp()
Float64Index([0.36787944117144233,  1.4918246976412703,
              2.718281828459045, 1.0,  1.3498588075760032],
            dtype='float64')
factorize(na_sentinel=- 1)

Encode the input values as integer labels

See also

cudf.core.series.Series.factorize

Encode the input values of Series.

fillna(value, downcast=None)

Fill null values with the specified value.

Parameters
valuescalar

Scalar value to use to fill nulls. This value cannot be a list-likes.

downcastdict, default is None

This Parameter is currently NON-FUNCTIONAL.

Returns
filledIndex

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, None, 4])
>>> index
Int64Index([1, 2, null, 4], dtype='int64')
>>> index.fillna(3)
Int64Index([1, 2, 3, 4], dtype='int64')
find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

Returns
begin, end2-tuple of int

The starting index and the ending index. The last value occurs at end - 1 position.

classmethod from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

Parameters
arrayPyArrow Array/ChunkedArray

PyArrow Object which has to be converted to Index

Returns
cudf Index
Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pyarrow as pa
>>> cudf.Index.from_arrow(pa.array(["a", "b", None]))
StringIndex(['a' 'b' None], dtype='object')
classmethod from_pandas(index, nan_as_null=None)

Convert from a Pandas Index.

Parameters
indexPandas Index object

A Pandas Index object which has to be converted to cuDF Index.

nan_as_nullbool, Default None

If None/True, converts np.nan values to null values. If False, leaves np.nan values as is.

Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pandas as pd
>>> import numpy as np
>>> data = [10, 20, 30, np.nan]
>>> pdi = pd.Index(data)
>>> cudf.core.index.Index.from_pandas(pdi)
Float64Index([10.0, 20.0, 30.0, <NA>], dtype='float64')
>>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False)
Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
get_level_values(level)

Return an Index of values for requested level.

This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.

Parameters
levelint or str

It is either the integer position or the name of the level.

Returns
Index

Calling object, as there is only one level in the Index.

See also

cudf.core.multiindex.MultiIndex.get_level_values

Get values for a level of a MultiIndex.

Notes

For Index, level should be 0, since there are no multiple levels.

Examples

>>> import cudf
>>> idx = cudf.Index(["a", "b", "c"])
>>> idx.get_level_values(0)
StringIndex(['a' 'b' 'c'], dtype='object')
get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if side=='right') position of given label.

Parameters
labelobject
side{‘left’, ‘right’}
kind{‘ix’, ‘loc’, ‘getitem’}
Returns
int

Index of label.

property gpu_values

View the data as a numba device array object

interleave_columns()

Interleave Series columns of a table into a single column.

Converts the column major table cols into a row major column.

Parameters
colsinput Table containing columns to interleave.
Returns
The interleaved columns as a single column

Examples

>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']])
>>> df
0    [A1, A2, A3]
1    [B1, B2, B3]
>>> df.interleave_columns()
0    A1
1    B1
2    A2
3    B2
4    A3
5    B3
property is_monotonic

Alias for is_monotonic_increasing.

property is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

property is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

property is_unique

Return if the index has unique values.

isin(values)

Return a boolean array where the index values are in values.

Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.

Parameters
valuesset, list-like, Index

Sought values.

Returns
is_containedcupy array

CuPy array of boolean values.

Examples

>>> idx = cudf.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')

Check whether each index value in a list of values.

>>> idx.isin([1, 4])
array([ True, False, False])
isna()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
isnull()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
join(other, how='left', level=None, return_indexers=False, sort=False)

Compute join_index and indexers to conform data structures to the new index.

Parameters
otherIndex.
how{‘left’, ‘right’, ‘inner’, ‘outer’}
return_indexersbool, default False
sortbool, default False

Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).

Returns: index

Examples

>>> import cudf
>>> lhs = cudf.DataFrame(
...     {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b']
... ).index
>>> lhs
MultiIndex([(2, 3),
            (3, 4),
            (1, 2)],
           names=['a', 'b'])
>>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index
>>> rhs
Int64Index([1, 4, 3], dtype='int64', name='a')
>>> lhs.join(rhs, how='inner')
MultiIndex([(3, 4),
            (1, 2)],
           names=['a', 'b'])
log()

Get the natural logarithm of all elements, element-wise.

Natural logarithm is the inverse of the exp function, so that x.log().exp() = x

Returns
DataFrame/Series/Index

Result of the element-wise natural logarithm.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.log()
0         NaN
1        -inf
2    0.000000
3   -1.125963
4   -0.693147
5         NaN
6    4.605170
dtype: float64

log operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.log()
      first    second
0       NaN -1.452434
1       NaN -1.203973
2 -0.693147  2.302585

log operation on Index:

>>> index = cudf.Index([10, 11, 500.0])
>>> index
Float64Index([10.0, 11.0, 500.0], dtype='float64')
>>> index.log()
Float64Index([2.302585092994046, 2.3978952727983707,
            6.214608098422191], dtype='float64')
mask(cond, other=None, inplace=False)

Replace values where the condition is True.

Parameters
condbool Series/DataFrame, array-like

Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.

other: scalar, list of scalars, Series/DataFrame

Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.

DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.

Series expects only scalar or series like with same length

inplacebool, default False

Whether to perform the operation in place on the data.

Returns
Same type as caller

Examples

>>> import cudf
>>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]})
>>> df.mask(df % 2 == 0, [-1, -1])
   A  B
0  1  3
1 -1  5
2  5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.mask(ser > 2, 10)
0    10
1    10
2     2
3     1
4     0
dtype: int64
>>> ser.mask(ser > 2)
0    <NA>
1    <NA>
2       2
3       1
4       0
dtype: int64
max()

Return the maximum value of the Index.

Returns
scalar

Maximum value.

See also

cudf.core.index.Index.min

Return the minimum value in an Index.

cudf.core.series.Series.max

Return the maximum value in a Series.

cudf.core.dataframe.DataFrame.max

Return the maximum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.max()
3
memory_usage(deep=False)

Memory usage of the values.

Parameters
deepbool

Introspect the data deeply, interrogate object dtypes for system-level memory consumption.

Returns
bytes used
min()

Return the minimum value of the Index.

Returns
scalar

Minimum value.

See also

cudf.core.index.Index.max

Return the maximum value in an Index.

cudf.core.series.Series.min

Return the minimum value in a Series.

cudf.core.dataframe.DataFrame.min

Return the minimum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.min()
1
property name

Returns the name of the Index.

property names

Returns a tuple containing the name of the Index.

property ndim

Dimension of the data. Apart from MultiIndex ndim is always 1.

property nlevels

Number of levels.

notna()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
notnull()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

Parameters
funcfunction

Function to apply to the Series/DataFrame/Index. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame/Index.

argsiterable, optional

Positional arguments passed into func.

kwargsmapping, optional

A dictionary of keyword arguments passed into func.

Returns
objectthe return type of func.

Examples

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. Instead of writing

>>> func(g(h(df), arg1=a), arg2=b, arg3=c)

You can write

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe(func, arg2=b, arg3=c)
... )

If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose f takes its data as arg2:

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe((func, 'arg2'), arg1=a, arg3=c)
...  )
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)

Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.

Parameters
axis{0 or ‘index’, 1 or ‘columns’}, default 0

Index to direct ranking.

method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’

How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.

numeric_onlybool, optional

For DataFrame objects, rank only numeric columns if set to True.

na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’

How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.

ascendingbool, default True

Whether or not the elements should be ranked in ascending order.

pctbool, default False

Whether or not to display the returned rankings in percentile form.

Returns
same type as caller

Return a Series or DataFrame with data ranks as values.

rename(name, inplace=False)

Alter Index name.

Defaults to returning new index.

Parameters
namelabel

Name(s) to set.

Returns
Index

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3], name='one')
>>> index
Int64Index([1, 2, 3], dtype='int64', name='one')
>>> index.name
'one'
>>> renamed_index = index.rename('two')
>>> renamed_index
Int64Index([1, 2, 3], dtype='int64', name='two')
>>> renamed_index.name
'two'
repeat(repeats, axis=None)

Repeats elements consecutively.

Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.

Parameters
repeatsint, or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.

Returns
Series/DataFrame/Index

A newly created object of same type as caller with repeated elements.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]})
>>> df
   a   b
0  1  10
1  2  20
2  3  30
>>> df.repeat(3)
   a   b
0  1  10
0  1  10
0  1  10
1  2  20
1  2  20
1  2  20
2  3  30
2  3  30
2  3  30

Repeat on Series

>>> s = cudf.Series([0, 2])
>>> s
0    0
1    2
dtype: int64
>>> s.repeat([3, 4])
0    0
0    0
0    0
1    2
1    2
1    2
1    2
dtype: int64
>>> s.repeat(2)
0    0
0    0
1    2
1    2
dtype: int64

Repeat on Index

>>> index = cudf.Index([10, 22, 33, 55])
>>> index
Int64Index([10, 22, 33, 55], dtype='int64')
>>> index.repeat(5)
Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33,
            33, 33, 33, 33, 55, 55, 55, 55, 55],
        dtype='int64')
round(decimals=0)

Round a DataFrame to a variable number of decimal places.

Parameters
decimalsint, dict, Series

Number of decimal places to round each column to. If an int is given, round each column to the same number of places. Otherwise dict and Series round to variable numbers of places. Column names should be in the keys if decimals is a dict-like, or in the index if decimals is a Series. Any columns not included in decimals will be left as is. Elements of decimals which are not columns of the input will be ignored.

Returns
DataFrame

A DataFrame with the affected columns rounded to the specified number of decimal places.

Examples

>>> df = cudf.DataFrame(
        [(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
...     columns=['dogs', 'cats']
... )
>>> df
    dogs  cats
0  0.21  0.32
1  0.01  0.67
2  0.66  0.03
3  0.21  0.18

By providing an integer each column is rounded to the same number of decimal places

>>> df.round(1)
    dogs  cats
0   0.2   0.3
1   0.0   0.7
2   0.7   0.0
3   0.2   0.2

With a dict, the number of places for specific columns can be specified with the column names as key and the number of decimal places as value

>>> df.round({'dogs': 1, 'cats': 0})
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

Using a Series, the number of places for specific columns can be specified with the column names as index and the number of decimal places as value

>>> decimals = cudf.Series([0, 1], index=['cats', 'dogs'])
>>> df.round(decimals)
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters
nint, optional

Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

fracfloat, optional

Fraction of axis items to return. Cannot be used with n.

replacebool, default False

Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”

weightsstr or ndarray-like, optional

Only supported for axis=1/”columns”

random_stateint, numpy RandomState or None, default None

Seed for the random number generator (if int), or None. If None, a random seed will be chosen. if RandomState, seed will be extracted from current state.

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.

Returns
Series or DataFrame or Index

A new object of same type as caller containing n items randomly sampled from the caller object.

Examples

>>> import cudf as cudf
>>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}})
>>> df.sample(3)
   a
1  2
3  4
0  1
>>> sr = cudf.Series([1, 2, 3, 4, 5])
>>> sr.sample(10, replace=True)
1    4
3    1
2    4
0    5
0    1
4    5
4    1
0    2
0    3
3    2
dtype: int64
>>> df = cudf.DataFrame(
... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]})
>>> df.sample(2, axis=1)
   a  c
0  1  3
1  2  4
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)

Scatter to a list of dataframes.

Uses map_index to determine the destination of each row of the original DataFrame.

Parameters
map_indexSeries, str or list-like

Scatter assignment for each row

map_sizeint

Length of output list. Must be >= uniques in map_index

keep_indexbool

Conserve original index values for each row

Returns
A list of cudf.DataFrame objects.
searchsorted(values, side='left', ascending=True, na_position='last')

Find indices where elements should be inserted to maintain order

Parameters
valueFrame (Shape must be consistent with self)

Values to be hypothetically inserted into Self

sidestr {‘left’, ‘right’} optional, default ‘left‘

If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index

ascendingbool optional, default True

Sorted Frame is in ascending order (otherwise descending)

na_positionstr {‘last’, ‘first’} optional, default ‘last‘

Position of null values in sorted order

Returns
1-D cupy array of insertion points

Examples

>>> s = cudf.Series([1, 2, 3])
>>> s.searchsorted(4)
3
>>> s.searchsorted([0, 4])
array([0, 3], dtype=int32)
>>> s.searchsorted([1, 3], side='left')
array([0, 2], dtype=int32)
>>> s.searchsorted([1, 3], side='right')
array([1, 3], dtype=int32)

If the values are not monotonically sorted, wrong locations may be returned:

>>> s = cudf.Series([2, 1, 3])
>>> s.searchsorted(1)
0   # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]})
>>> df
   a   b
0  1  10
1  3  12
2  5  14
3  7  16
>>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6],
... 'b': [10, 11, 13, 15]})
>>> values_df
   a   b
0  0  10
1  2  17
2  5  13
3  6  15
>>> df.searchsorted(values_df, ascending=False)
array([4, 4, 4, 0], dtype=int32)
set_names(names, level=None, inplace=False)

Set Index or MultiIndex name. Able to set new names partially and by level.

Parameters
nameslabel or list of label

Name(s) to set.

levelint, label or list of int or label, optional

If the index is a MultiIndex, level(s) to set (None for all levels). Otherwise level must be None.

inplacebool, default False

Modifies the object directly, instead of creating a new Index or MultiIndex.

Returns
Index

The same type as the caller or None if inplace is True.

See also

cudf.core.index.Index.rename

Able to set new names without level.

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = cudf.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
            ('python', 2019),
            ( 'cobra', 2018),
            ( 'cobra', 2019)],
           )
>>> idx.names
FrozenList([None, None])
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx.names
FrozenList(['kind', 'year'])
>>> idx.set_names('species', level=0, inplace=True)
>>> idx.names
FrozenList(['species', 'year'])
property shape

Returns a tuple representing the dimensionality of the Index.

shift(periods=1, freq=None, axis=0, fill_value=None)

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.sin()
0    0.000000
1    0.318683
2    0.479426
3    0.850904
4    0.893997
5   -0.801153
6    0.958916
dtype: float64

sin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.sin()
      first    second
0  0.000000 -0.506366
1 -0.958924  0.958916
2 -0.544021 -0.544072
3  0.650288 -0.999756

sin operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.sin()
Float64Index([-0.3894183423086505, -0.5063656411097588,
            0.8011526357338306, 0.8939966636005579],
            dtype='float64')
property size

Return the number of elements in the underlying data.

Returns
sizeSize of the DataFrame / Index / Series / MultiIndex

Examples

Size of an empty dataframe is 0.

>>> import cudf
>>> df = cudf.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>> df.size
0
>>> df = cudf.DataFrame(index=[1, 2, 3])
>>> df
Empty DataFrame
Columns: []
Index: [1, 2, 3]
>>> df.size
0

DataFrame with values

>>> df = cudf.DataFrame({'a': [10, 11, 12],
...         'b': ['hello', 'rapids', 'ai']})
>>> df
    a       b
0  10   hello
1  11  rapids
2  12      ai
>>> df.size
6
>>> df.index
RangeIndex(start=0, stop=3)
>>> df.index.size
3

Size of an Index

>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.size
0
>>> index = cudf.Index([1, 2, 3, 10])
>>> index
Int64Index([1, 2, 3, 10], dtype='int64')
>>> index.size
4

Size of a MultiIndex

>>> midx = cudf.MultiIndex(
...                 levels=[["a", "b", "c", None], ["1", None, "5"]],
...                 codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...                 names=["x", "y"],
...             )
>>> midx
MultiIndex([( 'a',  '1'),
            ( 'a',  '5'),
            ( 'b', <NA>),
            ( 'c', <NA>),
            (<NA>,  '1')],
           names=['x', 'y'])
>>> midx.size
5
sort_values(return_indexer=False, ascending=True, key=None)

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

Parameters
return_indexerbool, default False

Should the indices that would sort the index be returned.

ascendingbool, default True

Should the index values be sorted in an ascending order.

keyNone, optional

This parameter is NON-FUNCTIONAL.

Returns
sorted_indexIndex

Sorted copy of the index.

indexercupy.ndarray, optional

The indices that the index itself was sorted by.

See also

cudf.core.series.Series.min

Sort values of a Series.

cudf.core.dataframe.DataFrame.sort_values

Sort values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')

Sort values in ascending order (default behavior).

>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')

Sort values in descending order, and also get the indices idx was sorted by.

>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2],
                                                    dtype=int32))

Sorting values in a MultiIndex:

>>> midx = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> midx
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> midx.sort_values()
MultiIndex([(-10,  1),
            (  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11)],
           names=['x', 'y'])
>>> midx.sort_values(ascending=False)
MultiIndex([(  4, 11),
            (  3, 11),
            (  1,  5),
            (  1,  1),
            (-10,  1)],
           names=['x', 'y'])
sqrt()

Get the non-negative square-root of all elements, element-wise.

Returns
DataFrame/Series/Index

Result of the non-negative square-root of each element.

Examples

>>> import cudf
>>> import cudf
>>> ser = cudf.Series([10, 25, 81, 1.0, 100])
>>> ser
0     10.0
1     25.0
2     81.0
3      1.0
4    100.0
dtype: float64
>>> ser.sqrt()
0     3.162278
1     5.000000
2     9.000000
3     1.000000
4    10.000000
dtype: float64

sqrt operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-10.0, 100, 625],
...                      'second': [1, 2, 0.4]})
>>> df
   first  second
0  -10.0     1.0
1  100.0     2.0
2  625.0     0.4
>>> df.sqrt()
   first    second
0    NaN  1.000000
1   10.0  1.414214
2   25.0  0.632456

sqrt operation on Index:

>>> index = cudf.Index([-10.0, 100, 625])
>>> index
Float64Index([-10.0, 100.0, 625.0], dtype='float64')
>>> index.sqrt()
Float64Index([nan, 10.0, 25.0], dtype='float64')
sum()

Return the sum of all values of the Index.

Returns
scalar

Sum of all values.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.sum()
6
take(indices)

Gather only the specific subset of indices

Parameters
indices: An array-like that maps to values contained in this Index.
tan()

Get Trigonometric tangent, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.tan()
0    0.000000
1    0.336213
2    0.546302
3    1.619775
4   -1.995200
5    1.338690
6   -3.380140
dtype: float64

tan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.tan()
      first     second
0  0.000000  -0.587214
1 -3.380515  -3.380140
2  0.648361   0.648446
3 -0.855993  45.244742

tan operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.tan()
Float64Index([-0.4227932187381618,  -0.587213915156929,
            -1.3386902103511544, -1.995200412208242],
            dtype='float64')
tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

Parameters
selfinput Table containing columns to interleave.
countNumber of times to tile “rows”. Must be non-negative.
Returns
The table containing the tiled “rows”.

Examples

>>> df  = Dataframe([[8, 4, 7], [5, 2, 3]])
>>> count = 2
>>> df.tile(df, count)
   0  1  2
0  8  4  7
1  5  2  3
0  8  4  7
1  5  2  3
to_array(fillna=None)

Get a dense numpy array for the data.

Parameters
fillnastr or None

Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.

Notes

if fillna is None, null values are skipped. Therefore, the output size could be smaller.

to_arrow()

Convert Index to PyArrow Array

Returns
PyArrow Array

Examples

>>> import cudf
>>> ind = cudf.Index(["a", "b", None])
>>> ind.to_arrow()
<pyarrow.lib.StringArray object at 0x7f796b0e7750>
[
  "a",
  "b",
  null
]
to_dlpack()

Converts a cuDF object into a DLPack tensor.

DLPack is an open-source memory tensor structure: dmlc/dlpack.

This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.

Parameters
cudf_objDataFrame, Series, Index, or Column
Returns
pycapsule_objPyCapsule

Output DLPack tensor pointer which is encapsulated in a PyCapsule object.

to_frame(index=True, name=None)

Create a DataFrame with a column containing this Index

Parameters
indexboolean, default True

Set the index of the returned DataFrame as the original Index

namestr, default None

Name to be used for the column

Returns
DataFrame

cudf DataFrame

to_pandas()

Convert to a Pandas Index.

Examples

>>> import cudf
>>> idx = cudf.Index([-3, 10, 15, 20])
>>> idx
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> idx.to_pandas()
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> type(idx.to_pandas())
<class 'pandas.core.indexes.numeric.Int64Index'>
>>> type(idx)
<class 'cudf.core.index.GenericIndex'>
to_series(index=None, name=None)

Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.

Parameters
indexIndex, optional

Index of resulting Series. If None, defaults to original index.

namestr, optional

Dame of resulting Series. If None, defaults to name of original index.

Returns
Series

The dtype will be based on the type of the Index values.

unique()

Return unique values in the index.

Returns
Index without duplicates
property values

Return an array representing the data in the Index.

Returns
arrayA cupy array of data in the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values
array([  1, -10, 100,  20])
>>> type(index.values)
<class 'cupy.core.core.ndarray'>
property values_host

Return a numpy representation of the Index.

Only the values in the Index will be returned.

Returns
outnumpy.ndarray

The values of the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values_host
array([  1, -10, 100,  20])
>>> type(index.values_host)
<class 'numpy.ndarray'>
where(cond, other=None)

Replace values where the condition is False.

Parameters
condbool array-like with the same length as self

Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.

other: scalar, or array-like

Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.

Returns
Same type as caller

Examples

>>> import cudf
>>> index = cudf.Index([4, 3, 2, 1, 0])
>>> index
Int64Index([4, 3, 2, 1, 0], dtype='int64')
>>> index.where(index > 2, 15)
Int64Index([4, 3, 15, 15, 15], dtype='int64')

UInt16Index

class cudf.core.index.UInt16Index(data=None, dtype=None, copy=False, name=None)

Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.

Parameters
dataarray-like (1-dimensional)/ DataFrame

If it is a DataFrame, it will return a MultiIndex

dtypeNumPy dtype (default: object)

If dtype is None, we find the dtype that best fits the data.

copybool

Make a copy of input data.

nameobject

Name to be stored in the index.

tupleize_colsbool (default: True)

When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.

Returns
Index

cudf Index

Examples

>>> import cudf
>>> cudf.Index([1, 2, 3], dtype="uint64", name="a")
UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]}))
MultiIndex([(1, 2),
            (2, 3)],
          names=['a', 'b'])
Attributes
dtype

dtype of the underlying values in GenericIndex.

empty

Indicator whether Index is empty.

gpu_values

View the data as a numba device array object

is_monotonic

Alias for is_monotonic_increasing.

is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

is_unique

Return if the index has unique values.

name

Returns the name of the Index.

names

Returns a tuple containing the name of the Index.

ndim

Dimension of the data.

nlevels

Number of levels.

shape

Returns a tuple representing the dimensionality of the Index.

size

Return the number of elements in the underlying data.

values

Return an array representing the data in the Index.

values_host

Return a numpy representation of the Index.

Methods

acos()

Get Trigonometric inverse cosine, element-wise.

any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

argsort([ascending])

Return the integer indices that would sort the index.

asin()

Get Trigonometric inverse sine, element-wise.

astype(dtype[, copy])

Create an Index with values cast to dtypes.

atan()

Get Trigonometric inverse tangent, element-wise.

clip([lower, upper, inplace, axis])

Trim values at input threshold(s).

copy([name, deep, dtype, names])

Make a copy of this object.

cos()

Get Trigonometric cosine, element-wise.

difference(other[, sort])

Return a new Index with elements from the index that are not in other.

drop_duplicates([keep])

Return Index with duplicate values removed

dropna([how])

Return an Index with null values removed.

equals(other, **kwargs)

Determine if two Index objects contain the same elements.

exp()

Get the exponential of all elements, element-wise.

factorize([na_sentinel])

Encode the input values as integer labels

fillna(value[, downcast])

Fill null values with the specified value.

find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

from_pandas(index[, nan_as_null])

Convert from a Pandas Index.

get_level_values(level)

Return an Index of values for requested level.

get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label.

interleave_columns()

Interleave Series columns of a table into a single column.

isin(values)

Return a boolean array where the index values are in values.

isna()

Identify missing values.

isnull()

Identify missing values.

join(other[, how, level, return_indexers, sort])

Compute join_index and indexers to conform data structures to the new index.

log()

Get the natural logarithm of all elements, element-wise.

mask(cond[, other, inplace])

Replace values where the condition is True.

max()

Return the maximum value of the Index.

memory_usage([deep])

Memory usage of the values.

min()

Return the minimum value of the Index.

notna()

Identify non-missing values.

notnull()

Identify non-missing values.

pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

rank([axis, method, numeric_only, …])

Compute numerical data ranks (1 through n) along axis.

rename(name[, inplace])

Alter Index name.

repeat(repeats[, axis])

Repeats elements consecutively.

round([decimals])

Round a DataFrame to a variable number of decimal places.

sample([n, frac, replace, weights, …])

Return a random sample of items from an axis of object.

scatter_by_map(map_index[, map_size, keep_index])

Scatter to a list of dataframes.

searchsorted(values[, side, ascending, …])

Find indices where elements should be inserted to maintain order

set_names(names[, level, inplace])

Set Index or MultiIndex name.

shift([periods, freq, axis, fill_value])

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

sort_values([return_indexer, ascending, key])

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

sqrt()

Get the non-negative square-root of all elements, element-wise.

sum()

Return the sum of all values of the Index.

take(indices)

Gather only the specific subset of indices

tan()

Get Trigonometric tangent, element-wise.

tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

to_array([fillna])

Get a dense numpy array for the data.

to_arrow()

Convert Index to PyArrow Array

to_dlpack()

Converts a cuDF object into a DLPack tensor.

to_frame([index, name])

Create a DataFrame with a column containing this Index

to_pandas()

Convert to a Pandas Index.

to_series([index, name])

Create a Series with both index and values equal to the index keys.

unique()

Return unique values in the index.

where(cond[, other])

Replace values where the condition is False.

replace

acos()

Get Trigonometric inverse cosine, element-wise.

The inverse of cos so that, if y = x.cos(), then x = y.acos()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.acos()
0    3.141593
1    1.570796
2    0.000000
3    1.240482
4    1.047198
dtype: float64

acos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.acos()
      first    second
0  3.141593  1.334606
1  1.570796  1.266104
2  1.047198  1.470629

acos operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.acos()
Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0,
            1.5707963267948966,  1.266103672779499],
            dtype='float64')
any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

Parameters
otherIndex or list/tuple of indices
Returns
appendedIndex

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 10, 100])
>>> idx
Int64Index([1, 2, 10, 100], dtype='int64')
>>> other = cudf.Index([200, 400, 50])
>>> other
Int64Index([200, 400, 50], dtype='int64')
>>> idx.append(other)
Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')

append accepts list of Index objects

>>> idx.append([other, other])
Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
argsort(ascending=True, **kwargs)

Return the integer indices that would sort the index.

Parameters
ascendingbool, default True

If True, returns the indices for ascending order. If False, returns the indices for descending order.

Returns
arrayA cupy array containing Integer indices that

would sort the index if used as an indexer.

Examples

>>> import cudf
>>> index = cudf.Index([10, 100, 1, 1000])
>>> index
Int64Index([10, 100, 1, 1000], dtype='int64')
>>> index.argsort()
array([2, 0, 1, 3], dtype=int32)

The order of argsort can be reversed using ascending parameter, by setting it to False. >>> index.argsort(ascending=False) array([3, 1, 0, 2], dtype=int32)

argsort on a MultiIndex:

>>> index = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> index
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> index.argsort()
array([4, 0, 1, 2, 3], dtype=int32)
>>> index.argsort(ascending=False)
array([3, 2, 1, 0, 4], dtype=int32)
asin()

Get Trigonometric inverse sine, element-wise.

The inverse of sine so that, if y = x.sin(), then x = y.asin()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.asin()
0   -1.570796
1    0.000000
2    1.570796
3    0.330314
4    0.523599
dtype: float64

asin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.asin()
      first    second
0 -1.570796  0.236190
1  0.000000  0.304693
2  0.523599  0.100167

asin operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64')
>>> index.asin()
Float64Index([-1.5707963267948966, 0.41151684606748806,
            1.5707963267948966, 0.3046926540153975],
            dtype='float64')
astype(dtype, copy=False)

Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.

Parameters
dtypenumpy dtype

Use a numpy.dtype to cast entire Index object to.

copybool, default False

By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.

Returns
Index

Index with values cast to specified dtype.

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3])
>>> index
Int64Index([1, 2, 3], dtype='int64')
>>> index.astype('float64')
Float64Index([1.0, 2.0, 3.0], dtype='float64')
atan()

Get Trigonometric inverse tangent, element-wise.

The inverse of tan so that, if y = x.tan(), then x = y.atan()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10])
>>> ser
0    -1.00000
1     0.00000
2     1.00000
3     0.32434
4     0.50000
5   -10.00000
dtype: float64
>>> ser.atan()
0   -0.785398
1    0.000000
2    0.785398
3    0.313635
4    0.463648
5   -1.471128
dtype: float64

atan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.atan()
      first    second
0 -0.785398  0.229864
1 -1.471128  0.291457
2  0.463648  1.471128

atan operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.atan()
Float64Index([-0.7853981633974483,  0.3805063771123649,
                            0.7853981633974483, 0.0,
                            0.2914567944778671],
            dtype='float64')
clip(lower=None, upper=None, inplace=False, axis=1)

Trim values at input threshold(s).

Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.

Parameters
lowerscalar or array_like, default None

Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.

upperscalar or array_like, default None

Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.

inplacebool, default False
Returns
Clipped DataFrame/Series/Index/MultiIndex

Examples

>>> import cudf
>>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']})
>>> df.clip(lower=[2, 'b'], upper=[3, 'c'])
   a  b
0  2  b
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=None, upper=[3, 'c'])
   a  b
0  1  a
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=[2, 'b'], upper=None)
   a  b
0  2  b
1  2  b
2  3  c
3  4  d
>>> df.clip(lower=2, upper=3, inplace=True)
>>> df
   a  b
0  2  2
1  2  3
2  3  3
3  3  3
>>> import cudf
>>> sr = cudf.Series([1, 2, 3, 4])
>>> sr.clip(lower=2, upper=3)
0    2
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=None, upper=3)
0    1
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True)
>>> sr
0    2
1    2
2    3
3    4
dtype: int64
copy(name=None, deep=False, dtype=None, names=None)

Make a copy of this object.

Parameters
nameobject, default None

Name of index, use original name when None

deepbool, default True

Make a deep copy of the data. With deep=False the original data is used

dtypenumpy dtype, default None

Target datatype to cast into, use original dtype when None

nameslist-like, default False

Kept compatibility with MultiIndex. Should not be used.

Returns
New index instance, casted to new dtype
cos()

Get Trigonometric cosine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.cos()
0    1.000000
1    0.947861
2    0.877583
3    0.525322
4   -0.448074
5   -0.598460
6   -0.283691
dtype: float64

cos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.cos()
      first    second
0  1.000000  0.862319
1  0.283662 -0.283691
2 -0.839072 -0.839039
3 -0.759688 -0.022097

cos operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.cos()
Float64Index([ 0.9210609940028851,  0.8623188722876839,
            -0.5984600690578581, -0.4480736161291701],
            dtype='float64')
difference(other, sort=None)

Return a new Index with elements from the index that are not in other.

This is the set difference of two Index objects.

Parameters
otherIndex or array-like
sortFalse or None, default None

Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.

  • None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.

  • False : Do not sort the result.

Returns
differenceIndex

Examples

>>> import cudf
>>> idx1 = cudf.Index([2, 1, 3, 4])
>>> idx1
Int64Index([2, 1, 3, 4], dtype='int64')
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx2
Int64Index([3, 4, 5, 6], dtype='int64')
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
drop_duplicates(keep='first')

Return Index with duplicate values removed

Parameters
keep{‘first’, ‘last’, False}, default ‘first’
  • ‘first’Drop duplicates except for the

    first occurrence.

  • ‘last’Drop duplicates except for the

    last occurrence.

  • False : Drop all duplicates.

Returns
deduplicatedIndex

Examples

>>> import cudf
>>> idx = cudf.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
>>> idx
StringIndex(['lama' 'cow' 'lama' 'beetle' 'lama' 'hippo'], dtype='object')
>>> idx.drop_duplicates()
StringIndex(['beetle' 'cow' 'hippo' 'lama'], dtype='object')
dropna(how='any')

Return an Index with null values removed.

Parameters
how{‘any’, ‘all’}, default ‘any’

If the Index is a MultiIndex, drop the value when any or all levels are NaN.

Returns
validIndex

Examples

>>> import cudf
>>> index = cudf.Index(['a', None, 'b', 'c'])
>>> index
StringIndex(['a' None 'b' 'c'], dtype='object')
>>> index.dropna()
StringIndex(['a' 'b' 'c'], dtype='object')

Using dropna on a MultiIndex:

>>> midx = cudf.MultiIndex(
...         levels=[[1, None, 4, None], [1, 2, 5]],
...         codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...         names=["x", "y"],
...     )
>>> midx
MultiIndex([(   1, 1),
            (   1, 5),
            (<NA>, 2),
            (   4, 2),
            (<NA>, 1)],
           names=['x', 'y'])
>>> midx.dropna()
MultiIndex([(1, 1),
            (1, 5),
            (4, 2)],
           names=['x', 'y'])
property dtype

dtype of the underlying values in GenericIndex.

property empty

Indicator whether Index is empty.

True if Index is entirely empty (no items).

Returns
outbool

If Index is empty, return True, if not return False.

Examples

>>> import cudf
>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.empty
True
equals(other, **kwargs)

Determine if two Index objects contain the same elements.

Returns
out: bool

True if “other” is an Index and it has the same elements as calling index; False otherwise.

exp()

Get the exponential of all elements, element-wise.

Exponential is the inverse of the log function, so that x.exp().log() = x

Returns
DataFrame/Series/Index

Result of the element-wise exponential.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.exp()
0    3.678794e-01
1    1.000000e+00
2    2.718282e+00
3    1.383117e+00
4    1.648721e+00
5    4.539993e-05
6    2.688117e+43
dtype: float64

exp operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.exp()
      first        second
0  0.367879      1.263644
1  0.000045      1.349859
2  1.648721  22026.465795

exp operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.exp()
Float64Index([0.36787944117144233,  1.4918246976412703,
              2.718281828459045, 1.0,  1.3498588075760032],
            dtype='float64')
factorize(na_sentinel=- 1)

Encode the input values as integer labels

See also

cudf.core.series.Series.factorize

Encode the input values of Series.

fillna(value, downcast=None)

Fill null values with the specified value.

Parameters
valuescalar

Scalar value to use to fill nulls. This value cannot be a list-likes.

downcastdict, default is None

This Parameter is currently NON-FUNCTIONAL.

Returns
filledIndex

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, None, 4])
>>> index
Int64Index([1, 2, null, 4], dtype='int64')
>>> index.fillna(3)
Int64Index([1, 2, 3, 4], dtype='int64')
find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

Returns
begin, end2-tuple of int

The starting index and the ending index. The last value occurs at end - 1 position.

classmethod from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

Parameters
arrayPyArrow Array/ChunkedArray

PyArrow Object which has to be converted to Index

Returns
cudf Index
Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pyarrow as pa
>>> cudf.Index.from_arrow(pa.array(["a", "b", None]))
StringIndex(['a' 'b' None], dtype='object')
classmethod from_pandas(index, nan_as_null=None)

Convert from a Pandas Index.

Parameters
indexPandas Index object

A Pandas Index object which has to be converted to cuDF Index.

nan_as_nullbool, Default None

If None/True, converts np.nan values to null values. If False, leaves np.nan values as is.

Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pandas as pd
>>> import numpy as np
>>> data = [10, 20, 30, np.nan]
>>> pdi = pd.Index(data)
>>> cudf.core.index.Index.from_pandas(pdi)
Float64Index([10.0, 20.0, 30.0, <NA>], dtype='float64')
>>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False)
Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
get_level_values(level)

Return an Index of values for requested level.

This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.

Parameters
levelint or str

It is either the integer position or the name of the level.

Returns
Index

Calling object, as there is only one level in the Index.

See also

cudf.core.multiindex.MultiIndex.get_level_values

Get values for a level of a MultiIndex.

Notes

For Index, level should be 0, since there are no multiple levels.

Examples

>>> import cudf
>>> idx = cudf.Index(["a", "b", "c"])
>>> idx.get_level_values(0)
StringIndex(['a' 'b' 'c'], dtype='object')
get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if side=='right') position of given label.

Parameters
labelobject
side{‘left’, ‘right’}
kind{‘ix’, ‘loc’, ‘getitem’}
Returns
int

Index of label.

property gpu_values

View the data as a numba device array object

interleave_columns()

Interleave Series columns of a table into a single column.

Converts the column major table cols into a row major column.

Parameters
colsinput Table containing columns to interleave.
Returns
The interleaved columns as a single column

Examples

>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']])
>>> df
0    [A1, A2, A3]
1    [B1, B2, B3]
>>> df.interleave_columns()
0    A1
1    B1
2    A2
3    B2
4    A3
5    B3
property is_monotonic

Alias for is_monotonic_increasing.

property is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

property is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

property is_unique

Return if the index has unique values.

isin(values)

Return a boolean array where the index values are in values.

Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.

Parameters
valuesset, list-like, Index

Sought values.

Returns
is_containedcupy array

CuPy array of boolean values.

Examples

>>> idx = cudf.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')

Check whether each index value in a list of values.

>>> idx.isin([1, 4])
array([ True, False, False])
isna()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
isnull()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
join(other, how='left', level=None, return_indexers=False, sort=False)

Compute join_index and indexers to conform data structures to the new index.

Parameters
otherIndex.
how{‘left’, ‘right’, ‘inner’, ‘outer’}
return_indexersbool, default False
sortbool, default False

Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).

Returns: index

Examples

>>> import cudf
>>> lhs = cudf.DataFrame(
...     {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b']
... ).index
>>> lhs
MultiIndex([(2, 3),
            (3, 4),
            (1, 2)],
           names=['a', 'b'])
>>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index
>>> rhs
Int64Index([1, 4, 3], dtype='int64', name='a')
>>> lhs.join(rhs, how='inner')
MultiIndex([(3, 4),
            (1, 2)],
           names=['a', 'b'])
log()

Get the natural logarithm of all elements, element-wise.

Natural logarithm is the inverse of the exp function, so that x.log().exp() = x

Returns
DataFrame/Series/Index

Result of the element-wise natural logarithm.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.log()
0         NaN
1        -inf
2    0.000000
3   -1.125963
4   -0.693147
5         NaN
6    4.605170
dtype: float64

log operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.log()
      first    second
0       NaN -1.452434
1       NaN -1.203973
2 -0.693147  2.302585

log operation on Index:

>>> index = cudf.Index([10, 11, 500.0])
>>> index
Float64Index([10.0, 11.0, 500.0], dtype='float64')
>>> index.log()
Float64Index([2.302585092994046, 2.3978952727983707,
            6.214608098422191], dtype='float64')
mask(cond, other=None, inplace=False)

Replace values where the condition is True.

Parameters
condbool Series/DataFrame, array-like

Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.

other: scalar, list of scalars, Series/DataFrame

Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.

DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.

Series expects only scalar or series like with same length

inplacebool, default False

Whether to perform the operation in place on the data.

Returns
Same type as caller

Examples

>>> import cudf
>>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]})
>>> df.mask(df % 2 == 0, [-1, -1])
   A  B
0  1  3
1 -1  5
2  5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.mask(ser > 2, 10)
0    10
1    10
2     2
3     1
4     0
dtype: int64
>>> ser.mask(ser > 2)
0    <NA>
1    <NA>
2       2
3       1
4       0
dtype: int64
max()

Return the maximum value of the Index.

Returns
scalar

Maximum value.

See also

cudf.core.index.Index.min

Return the minimum value in an Index.

cudf.core.series.Series.max

Return the maximum value in a Series.

cudf.core.dataframe.DataFrame.max

Return the maximum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.max()
3
memory_usage(deep=False)

Memory usage of the values.

Parameters
deepbool

Introspect the data deeply, interrogate object dtypes for system-level memory consumption.

Returns
bytes used
min()

Return the minimum value of the Index.

Returns
scalar

Minimum value.

See also

cudf.core.index.Index.max

Return the maximum value in an Index.

cudf.core.series.Series.min

Return the minimum value in a Series.

cudf.core.dataframe.DataFrame.min

Return the minimum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.min()
1
property name

Returns the name of the Index.

property names

Returns a tuple containing the name of the Index.

property ndim

Dimension of the data. Apart from MultiIndex ndim is always 1.

property nlevels

Number of levels.

notna()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
notnull()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

Parameters
funcfunction

Function to apply to the Series/DataFrame/Index. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame/Index.

argsiterable, optional

Positional arguments passed into func.

kwargsmapping, optional

A dictionary of keyword arguments passed into func.

Returns
objectthe return type of func.

Examples

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. Instead of writing

>>> func(g(h(df), arg1=a), arg2=b, arg3=c)

You can write

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe(func, arg2=b, arg3=c)
... )

If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose f takes its data as arg2:

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe((func, 'arg2'), arg1=a, arg3=c)
...  )
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)

Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.

Parameters
axis{0 or ‘index’, 1 or ‘columns’}, default 0

Index to direct ranking.

method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’

How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.

numeric_onlybool, optional

For DataFrame objects, rank only numeric columns if set to True.

na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’

How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.

ascendingbool, default True

Whether or not the elements should be ranked in ascending order.

pctbool, default False

Whether or not to display the returned rankings in percentile form.

Returns
same type as caller

Return a Series or DataFrame with data ranks as values.

rename(name, inplace=False)

Alter Index name.

Defaults to returning new index.

Parameters
namelabel

Name(s) to set.

Returns
Index

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3], name='one')
>>> index
Int64Index([1, 2, 3], dtype='int64', name='one')
>>> index.name
'one'
>>> renamed_index = index.rename('two')
>>> renamed_index
Int64Index([1, 2, 3], dtype='int64', name='two')
>>> renamed_index.name
'two'
repeat(repeats, axis=None)

Repeats elements consecutively.

Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.

Parameters
repeatsint, or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.

Returns
Series/DataFrame/Index

A newly created object of same type as caller with repeated elements.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]})
>>> df
   a   b
0  1  10
1  2  20
2  3  30
>>> df.repeat(3)
   a   b
0  1  10
0  1  10
0  1  10
1  2  20
1  2  20
1  2  20
2  3  30
2  3  30
2  3  30

Repeat on Series

>>> s = cudf.Series([0, 2])
>>> s
0    0
1    2
dtype: int64
>>> s.repeat([3, 4])
0    0
0    0
0    0
1    2
1    2
1    2
1    2
dtype: int64
>>> s.repeat(2)
0    0
0    0
1    2
1    2
dtype: int64

Repeat on Index

>>> index = cudf.Index([10, 22, 33, 55])
>>> index
Int64Index([10, 22, 33, 55], dtype='int64')
>>> index.repeat(5)
Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33,
            33, 33, 33, 33, 55, 55, 55, 55, 55],
        dtype='int64')
round(decimals=0)

Round a DataFrame to a variable number of decimal places.

Parameters
decimalsint, dict, Series

Number of decimal places to round each column to. If an int is given, round each column to the same number of places. Otherwise dict and Series round to variable numbers of places. Column names should be in the keys if decimals is a dict-like, or in the index if decimals is a Series. Any columns not included in decimals will be left as is. Elements of decimals which are not columns of the input will be ignored.

Returns
DataFrame

A DataFrame with the affected columns rounded to the specified number of decimal places.

Examples

>>> df = cudf.DataFrame(
        [(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
...     columns=['dogs', 'cats']
... )
>>> df
    dogs  cats
0  0.21  0.32
1  0.01  0.67
2  0.66  0.03
3  0.21  0.18

By providing an integer each column is rounded to the same number of decimal places

>>> df.round(1)
    dogs  cats
0   0.2   0.3
1   0.0   0.7
2   0.7   0.0
3   0.2   0.2

With a dict, the number of places for specific columns can be specified with the column names as key and the number of decimal places as value

>>> df.round({'dogs': 1, 'cats': 0})
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

Using a Series, the number of places for specific columns can be specified with the column names as index and the number of decimal places as value

>>> decimals = cudf.Series([0, 1], index=['cats', 'dogs'])
>>> df.round(decimals)
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters
nint, optional

Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

fracfloat, optional

Fraction of axis items to return. Cannot be used with n.

replacebool, default False

Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”

weightsstr or ndarray-like, optional

Only supported for axis=1/”columns”

random_stateint, numpy RandomState or None, default None

Seed for the random number generator (if int), or None. If None, a random seed will be chosen. if RandomState, seed will be extracted from current state.

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.

Returns
Series or DataFrame or Index

A new object of same type as caller containing n items randomly sampled from the caller object.

Examples

>>> import cudf as cudf
>>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}})
>>> df.sample(3)
   a
1  2
3  4
0  1
>>> sr = cudf.Series([1, 2, 3, 4, 5])
>>> sr.sample(10, replace=True)
1    4
3    1
2    4
0    5
0    1
4    5
4    1
0    2
0    3
3    2
dtype: int64
>>> df = cudf.DataFrame(
... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]})
>>> df.sample(2, axis=1)
   a  c
0  1  3
1  2  4
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)

Scatter to a list of dataframes.

Uses map_index to determine the destination of each row of the original DataFrame.

Parameters
map_indexSeries, str or list-like

Scatter assignment for each row

map_sizeint

Length of output list. Must be >= uniques in map_index

keep_indexbool

Conserve original index values for each row

Returns
A list of cudf.DataFrame objects.
searchsorted(values, side='left', ascending=True, na_position='last')

Find indices where elements should be inserted to maintain order

Parameters
valueFrame (Shape must be consistent with self)

Values to be hypothetically inserted into Self

sidestr {‘left’, ‘right’} optional, default ‘left‘

If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index

ascendingbool optional, default True

Sorted Frame is in ascending order (otherwise descending)

na_positionstr {‘last’, ‘first’} optional, default ‘last‘

Position of null values in sorted order

Returns
1-D cupy array of insertion points

Examples

>>> s = cudf.Series([1, 2, 3])
>>> s.searchsorted(4)
3
>>> s.searchsorted([0, 4])
array([0, 3], dtype=int32)
>>> s.searchsorted([1, 3], side='left')
array([0, 2], dtype=int32)
>>> s.searchsorted([1, 3], side='right')
array([1, 3], dtype=int32)

If the values are not monotonically sorted, wrong locations may be returned:

>>> s = cudf.Series([2, 1, 3])
>>> s.searchsorted(1)
0   # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]})
>>> df
   a   b
0  1  10
1  3  12
2  5  14
3  7  16
>>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6],
... 'b': [10, 11, 13, 15]})
>>> values_df
   a   b
0  0  10
1  2  17
2  5  13
3  6  15
>>> df.searchsorted(values_df, ascending=False)
array([4, 4, 4, 0], dtype=int32)
set_names(names, level=None, inplace=False)

Set Index or MultiIndex name. Able to set new names partially and by level.

Parameters
nameslabel or list of label

Name(s) to set.

levelint, label or list of int or label, optional

If the index is a MultiIndex, level(s) to set (None for all levels). Otherwise level must be None.

inplacebool, default False

Modifies the object directly, instead of creating a new Index or MultiIndex.

Returns
Index

The same type as the caller or None if inplace is True.

See also

cudf.core.index.Index.rename

Able to set new names without level.

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = cudf.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
            ('python', 2019),
            ( 'cobra', 2018),
            ( 'cobra', 2019)],
           )
>>> idx.names
FrozenList([None, None])
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx.names
FrozenList(['kind', 'year'])
>>> idx.set_names('species', level=0, inplace=True)
>>> idx.names
FrozenList(['species', 'year'])
property shape

Returns a tuple representing the dimensionality of the Index.

shift(periods=1, freq=None, axis=0, fill_value=None)

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.sin()
0    0.000000
1    0.318683
2    0.479426
3    0.850904
4    0.893997
5   -0.801153
6    0.958916
dtype: float64

sin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.sin()
      first    second
0  0.000000 -0.506366
1 -0.958924  0.958916
2 -0.544021 -0.544072
3  0.650288 -0.999756

sin operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.sin()
Float64Index([-0.3894183423086505, -0.5063656411097588,
            0.8011526357338306, 0.8939966636005579],
            dtype='float64')
property size

Return the number of elements in the underlying data.

Returns
sizeSize of the DataFrame / Index / Series / MultiIndex

Examples

Size of an empty dataframe is 0.

>>> import cudf
>>> df = cudf.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>> df.size
0
>>> df = cudf.DataFrame(index=[1, 2, 3])
>>> df
Empty DataFrame
Columns: []
Index: [1, 2, 3]
>>> df.size
0

DataFrame with values

>>> df = cudf.DataFrame({'a': [10, 11, 12],
...         'b': ['hello', 'rapids', 'ai']})
>>> df
    a       b
0  10   hello
1  11  rapids
2  12      ai
>>> df.size
6
>>> df.index
RangeIndex(start=0, stop=3)
>>> df.index.size
3

Size of an Index

>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.size
0
>>> index = cudf.Index([1, 2, 3, 10])
>>> index
Int64Index([1, 2, 3, 10], dtype='int64')
>>> index.size
4

Size of a MultiIndex

>>> midx = cudf.MultiIndex(
...                 levels=[["a", "b", "c", None], ["1", None, "5"]],
...                 codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...                 names=["x", "y"],
...             )
>>> midx
MultiIndex([( 'a',  '1'),
            ( 'a',  '5'),
            ( 'b', <NA>),
            ( 'c', <NA>),
            (<NA>,  '1')],
           names=['x', 'y'])
>>> midx.size
5
sort_values(return_indexer=False, ascending=True, key=None)

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

Parameters
return_indexerbool, default False

Should the indices that would sort the index be returned.

ascendingbool, default True

Should the index values be sorted in an ascending order.

keyNone, optional

This parameter is NON-FUNCTIONAL.

Returns
sorted_indexIndex

Sorted copy of the index.

indexercupy.ndarray, optional

The indices that the index itself was sorted by.

See also

cudf.core.series.Series.min

Sort values of a Series.

cudf.core.dataframe.DataFrame.sort_values

Sort values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')

Sort values in ascending order (default behavior).

>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')

Sort values in descending order, and also get the indices idx was sorted by.

>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2],
                                                    dtype=int32))

Sorting values in a MultiIndex:

>>> midx = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> midx
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> midx.sort_values()
MultiIndex([(-10,  1),
            (  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11)],
           names=['x', 'y'])
>>> midx.sort_values(ascending=False)
MultiIndex([(  4, 11),
            (  3, 11),
            (  1,  5),
            (  1,  1),
            (-10,  1)],
           names=['x', 'y'])
sqrt()

Get the non-negative square-root of all elements, element-wise.

Returns
DataFrame/Series/Index

Result of the non-negative square-root of each element.

Examples

>>> import cudf
>>> import cudf
>>> ser = cudf.Series([10, 25, 81, 1.0, 100])
>>> ser
0     10.0
1     25.0
2     81.0
3      1.0
4    100.0
dtype: float64
>>> ser.sqrt()
0     3.162278
1     5.000000
2     9.000000
3     1.000000
4    10.000000
dtype: float64

sqrt operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-10.0, 100, 625],
...                      'second': [1, 2, 0.4]})
>>> df
   first  second
0  -10.0     1.0
1  100.0     2.0
2  625.0     0.4
>>> df.sqrt()
   first    second
0    NaN  1.000000
1   10.0  1.414214
2   25.0  0.632456

sqrt operation on Index:

>>> index = cudf.Index([-10.0, 100, 625])
>>> index
Float64Index([-10.0, 100.0, 625.0], dtype='float64')
>>> index.sqrt()
Float64Index([nan, 10.0, 25.0], dtype='float64')
sum()

Return the sum of all values of the Index.

Returns
scalar

Sum of all values.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.sum()
6
take(indices)

Gather only the specific subset of indices

Parameters
indices: An array-like that maps to values contained in this Index.
tan()

Get Trigonometric tangent, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.tan()
0    0.000000
1    0.336213
2    0.546302
3    1.619775
4   -1.995200
5    1.338690
6   -3.380140
dtype: float64

tan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.tan()
      first     second
0  0.000000  -0.587214
1 -3.380515  -3.380140
2  0.648361   0.648446
3 -0.855993  45.244742

tan operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.tan()
Float64Index([-0.4227932187381618,  -0.587213915156929,
            -1.3386902103511544, -1.995200412208242],
            dtype='float64')
tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

Parameters
selfinput Table containing columns to interleave.
countNumber of times to tile “rows”. Must be non-negative.
Returns
The table containing the tiled “rows”.

Examples

>>> df  = Dataframe([[8, 4, 7], [5, 2, 3]])
>>> count = 2
>>> df.tile(df, count)
   0  1  2
0  8  4  7
1  5  2  3
0  8  4  7
1  5  2  3
to_array(fillna=None)

Get a dense numpy array for the data.

Parameters
fillnastr or None

Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.

Notes

if fillna is None, null values are skipped. Therefore, the output size could be smaller.

to_arrow()

Convert Index to PyArrow Array

Returns
PyArrow Array

Examples

>>> import cudf
>>> ind = cudf.Index(["a", "b", None])
>>> ind.to_arrow()
<pyarrow.lib.StringArray object at 0x7f796b0e7750>
[
  "a",
  "b",
  null
]
to_dlpack()

Converts a cuDF object into a DLPack tensor.

DLPack is an open-source memory tensor structure: dmlc/dlpack.

This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.

Parameters
cudf_objDataFrame, Series, Index, or Column
Returns
pycapsule_objPyCapsule

Output DLPack tensor pointer which is encapsulated in a PyCapsule object.

to_frame(index=True, name=None)

Create a DataFrame with a column containing this Index

Parameters
indexboolean, default True

Set the index of the returned DataFrame as the original Index

namestr, default None

Name to be used for the column

Returns
DataFrame

cudf DataFrame

to_pandas()

Convert to a Pandas Index.

Examples

>>> import cudf
>>> idx = cudf.Index([-3, 10, 15, 20])
>>> idx
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> idx.to_pandas()
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> type(idx.to_pandas())
<class 'pandas.core.indexes.numeric.Int64Index'>
>>> type(idx)
<class 'cudf.core.index.GenericIndex'>
to_series(index=None, name=None)

Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.

Parameters
indexIndex, optional

Index of resulting Series. If None, defaults to original index.

namestr, optional

Dame of resulting Series. If None, defaults to name of original index.

Returns
Series

The dtype will be based on the type of the Index values.

unique()

Return unique values in the index.

Returns
Index without duplicates
property values

Return an array representing the data in the Index.

Returns
arrayA cupy array of data in the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values
array([  1, -10, 100,  20])
>>> type(index.values)
<class 'cupy.core.core.ndarray'>
property values_host

Return a numpy representation of the Index.

Only the values in the Index will be returned.

Returns
outnumpy.ndarray

The values of the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values_host
array([  1, -10, 100,  20])
>>> type(index.values_host)
<class 'numpy.ndarray'>
where(cond, other=None)

Replace values where the condition is False.

Parameters
condbool array-like with the same length as self

Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.

other: scalar, or array-like

Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.

Returns
Same type as caller

Examples

>>> import cudf
>>> index = cudf.Index([4, 3, 2, 1, 0])
>>> index
Int64Index([4, 3, 2, 1, 0], dtype='int64')
>>> index.where(index > 2, 15)
Int64Index([4, 3, 15, 15, 15], dtype='int64')

UInt32Index

class cudf.core.index.UInt32Index(data=None, dtype=None, copy=False, name=None)

Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.

Parameters
dataarray-like (1-dimensional)/ DataFrame

If it is a DataFrame, it will return a MultiIndex

dtypeNumPy dtype (default: object)

If dtype is None, we find the dtype that best fits the data.

copybool

Make a copy of input data.

nameobject

Name to be stored in the index.

tupleize_colsbool (default: True)

When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.

Returns
Index

cudf Index

Examples

>>> import cudf
>>> cudf.Index([1, 2, 3], dtype="uint64", name="a")
UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]}))
MultiIndex([(1, 2),
            (2, 3)],
          names=['a', 'b'])
Attributes
dtype

dtype of the underlying values in GenericIndex.

empty

Indicator whether Index is empty.

gpu_values

View the data as a numba device array object

is_monotonic

Alias for is_monotonic_increasing.

is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

is_unique

Return if the index has unique values.

name

Returns the name of the Index.

names

Returns a tuple containing the name of the Index.

ndim

Dimension of the data.

nlevels

Number of levels.

shape

Returns a tuple representing the dimensionality of the Index.

size

Return the number of elements in the underlying data.

values

Return an array representing the data in the Index.

values_host

Return a numpy representation of the Index.

Methods

acos()

Get Trigonometric inverse cosine, element-wise.

any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

argsort([ascending])

Return the integer indices that would sort the index.

asin()

Get Trigonometric inverse sine, element-wise.

astype(dtype[, copy])

Create an Index with values cast to dtypes.

atan()

Get Trigonometric inverse tangent, element-wise.

clip([lower, upper, inplace, axis])

Trim values at input threshold(s).

copy([name, deep, dtype, names])

Make a copy of this object.

cos()

Get Trigonometric cosine, element-wise.

difference(other[, sort])

Return a new Index with elements from the index that are not in other.

drop_duplicates([keep])

Return Index with duplicate values removed

dropna([how])

Return an Index with null values removed.

equals(other, **kwargs)

Determine if two Index objects contain the same elements.

exp()

Get the exponential of all elements, element-wise.

factorize([na_sentinel])

Encode the input values as integer labels

fillna(value[, downcast])

Fill null values with the specified value.

find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

from_pandas(index[, nan_as_null])

Convert from a Pandas Index.

get_level_values(level)

Return an Index of values for requested level.

get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label.

interleave_columns()

Interleave Series columns of a table into a single column.

isin(values)

Return a boolean array where the index values are in values.

isna()

Identify missing values.

isnull()

Identify missing values.

join(other[, how, level, return_indexers, sort])

Compute join_index and indexers to conform data structures to the new index.

log()

Get the natural logarithm of all elements, element-wise.

mask(cond[, other, inplace])

Replace values where the condition is True.

max()

Return the maximum value of the Index.

memory_usage([deep])

Memory usage of the values.

min()

Return the minimum value of the Index.

notna()

Identify non-missing values.

notnull()

Identify non-missing values.

pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

rank([axis, method, numeric_only, …])

Compute numerical data ranks (1 through n) along axis.

rename(name[, inplace])

Alter Index name.

repeat(repeats[, axis])

Repeats elements consecutively.

round([decimals])

Round a DataFrame to a variable number of decimal places.

sample([n, frac, replace, weights, …])

Return a random sample of items from an axis of object.

scatter_by_map(map_index[, map_size, keep_index])

Scatter to a list of dataframes.

searchsorted(values[, side, ascending, …])

Find indices where elements should be inserted to maintain order

set_names(names[, level, inplace])

Set Index or MultiIndex name.

shift([periods, freq, axis, fill_value])

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

sort_values([return_indexer, ascending, key])

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

sqrt()

Get the non-negative square-root of all elements, element-wise.

sum()

Return the sum of all values of the Index.

take(indices)

Gather only the specific subset of indices

tan()

Get Trigonometric tangent, element-wise.

tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

to_array([fillna])

Get a dense numpy array for the data.

to_arrow()

Convert Index to PyArrow Array

to_dlpack()

Converts a cuDF object into a DLPack tensor.

to_frame([index, name])

Create a DataFrame with a column containing this Index

to_pandas()

Convert to a Pandas Index.

to_series([index, name])

Create a Series with both index and values equal to the index keys.

unique()

Return unique values in the index.

where(cond[, other])

Replace values where the condition is False.

replace

acos()

Get Trigonometric inverse cosine, element-wise.

The inverse of cos so that, if y = x.cos(), then x = y.acos()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.acos()
0    3.141593
1    1.570796
2    0.000000
3    1.240482
4    1.047198
dtype: float64

acos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.acos()
      first    second
0  3.141593  1.334606
1  1.570796  1.266104
2  1.047198  1.470629

acos operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.acos()
Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0,
            1.5707963267948966,  1.266103672779499],
            dtype='float64')
any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

Parameters
otherIndex or list/tuple of indices
Returns
appendedIndex

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 10, 100])
>>> idx
Int64Index([1, 2, 10, 100], dtype='int64')
>>> other = cudf.Index([200, 400, 50])
>>> other
Int64Index([200, 400, 50], dtype='int64')
>>> idx.append(other)
Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')

append accepts list of Index objects

>>> idx.append([other, other])
Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
argsort(ascending=True, **kwargs)

Return the integer indices that would sort the index.

Parameters
ascendingbool, default True

If True, returns the indices for ascending order. If False, returns the indices for descending order.

Returns
arrayA cupy array containing Integer indices that

would sort the index if used as an indexer.

Examples

>>> import cudf
>>> index = cudf.Index([10, 100, 1, 1000])
>>> index
Int64Index([10, 100, 1, 1000], dtype='int64')
>>> index.argsort()
array([2, 0, 1, 3], dtype=int32)

The order of argsort can be reversed using ascending parameter, by setting it to False. >>> index.argsort(ascending=False) array([3, 1, 0, 2], dtype=int32)

argsort on a MultiIndex:

>>> index = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> index
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> index.argsort()
array([4, 0, 1, 2, 3], dtype=int32)
>>> index.argsort(ascending=False)
array([3, 2, 1, 0, 4], dtype=int32)
asin()

Get Trigonometric inverse sine, element-wise.

The inverse of sine so that, if y = x.sin(), then x = y.asin()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.asin()
0   -1.570796
1    0.000000
2    1.570796
3    0.330314
4    0.523599
dtype: float64

asin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.asin()
      first    second
0 -1.570796  0.236190
1  0.000000  0.304693
2  0.523599  0.100167

asin operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64')
>>> index.asin()
Float64Index([-1.5707963267948966, 0.41151684606748806,
            1.5707963267948966, 0.3046926540153975],
            dtype='float64')
astype(dtype, copy=False)

Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.

Parameters
dtypenumpy dtype

Use a numpy.dtype to cast entire Index object to.

copybool, default False

By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.

Returns
Index

Index with values cast to specified dtype.

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3])
>>> index
Int64Index([1, 2, 3], dtype='int64')
>>> index.astype('float64')
Float64Index([1.0, 2.0, 3.0], dtype='float64')
atan()

Get Trigonometric inverse tangent, element-wise.

The inverse of tan so that, if y = x.tan(), then x = y.atan()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10])
>>> ser
0    -1.00000
1     0.00000
2     1.00000
3     0.32434
4     0.50000
5   -10.00000
dtype: float64
>>> ser.atan()
0   -0.785398
1    0.000000
2    0.785398
3    0.313635
4    0.463648
5   -1.471128
dtype: float64

atan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.atan()
      first    second
0 -0.785398  0.229864
1 -1.471128  0.291457
2  0.463648  1.471128

atan operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.atan()
Float64Index([-0.7853981633974483,  0.3805063771123649,
                            0.7853981633974483, 0.0,
                            0.2914567944778671],
            dtype='float64')
clip(lower=None, upper=None, inplace=False, axis=1)

Trim values at input threshold(s).

Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.

Parameters
lowerscalar or array_like, default None

Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.

upperscalar or array_like, default None

Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.

inplacebool, default False
Returns
Clipped DataFrame/Series/Index/MultiIndex

Examples

>>> import cudf
>>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']})
>>> df.clip(lower=[2, 'b'], upper=[3, 'c'])
   a  b
0  2  b
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=None, upper=[3, 'c'])
   a  b
0  1  a
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=[2, 'b'], upper=None)
   a  b
0  2  b
1  2  b
2  3  c
3  4  d
>>> df.clip(lower=2, upper=3, inplace=True)
>>> df
   a  b
0  2  2
1  2  3
2  3  3
3  3  3
>>> import cudf
>>> sr = cudf.Series([1, 2, 3, 4])
>>> sr.clip(lower=2, upper=3)
0    2
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=None, upper=3)
0    1
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True)
>>> sr
0    2
1    2
2    3
3    4
dtype: int64
copy(name=None, deep=False, dtype=None, names=None)

Make a copy of this object.

Parameters
nameobject, default None

Name of index, use original name when None

deepbool, default True

Make a deep copy of the data. With deep=False the original data is used

dtypenumpy dtype, default None

Target datatype to cast into, use original dtype when None

nameslist-like, default False

Kept compatibility with MultiIndex. Should not be used.

Returns
New index instance, casted to new dtype
cos()

Get Trigonometric cosine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.cos()
0    1.000000
1    0.947861
2    0.877583
3    0.525322
4   -0.448074
5   -0.598460
6   -0.283691
dtype: float64

cos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.cos()
      first    second
0  1.000000  0.862319
1  0.283662 -0.283691
2 -0.839072 -0.839039
3 -0.759688 -0.022097

cos operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.cos()
Float64Index([ 0.9210609940028851,  0.8623188722876839,
            -0.5984600690578581, -0.4480736161291701],
            dtype='float64')
difference(other, sort=None)

Return a new Index with elements from the index that are not in other.

This is the set difference of two Index objects.

Parameters
otherIndex or array-like
sortFalse or None, default None

Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.

  • None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.

  • False : Do not sort the result.

Returns
differenceIndex

Examples

>>> import cudf
>>> idx1 = cudf.Index([2, 1, 3, 4])
>>> idx1
Int64Index([2, 1, 3, 4], dtype='int64')
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx2
Int64Index([3, 4, 5, 6], dtype='int64')
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
drop_duplicates(keep='first')

Return Index with duplicate values removed

Parameters
keep{‘first’, ‘last’, False}, default ‘first’
  • ‘first’Drop duplicates except for the

    first occurrence.

  • ‘last’Drop duplicates except for the

    last occurrence.

  • False : Drop all duplicates.

Returns
deduplicatedIndex

Examples

>>> import cudf
>>> idx = cudf.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
>>> idx
StringIndex(['lama' 'cow' 'lama' 'beetle' 'lama' 'hippo'], dtype='object')
>>> idx.drop_duplicates()
StringIndex(['beetle' 'cow' 'hippo' 'lama'], dtype='object')
dropna(how='any')

Return an Index with null values removed.

Parameters
how{‘any’, ‘all’}, default ‘any’

If the Index is a MultiIndex, drop the value when any or all levels are NaN.

Returns
validIndex

Examples

>>> import cudf
>>> index = cudf.Index(['a', None, 'b', 'c'])
>>> index
StringIndex(['a' None 'b' 'c'], dtype='object')
>>> index.dropna()
StringIndex(['a' 'b' 'c'], dtype='object')

Using dropna on a MultiIndex:

>>> midx = cudf.MultiIndex(
...         levels=[[1, None, 4, None], [1, 2, 5]],
...         codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...         names=["x", "y"],
...     )
>>> midx
MultiIndex([(   1, 1),
            (   1, 5),
            (<NA>, 2),
            (   4, 2),
            (<NA>, 1)],
           names=['x', 'y'])
>>> midx.dropna()
MultiIndex([(1, 1),
            (1, 5),
            (4, 2)],
           names=['x', 'y'])
property dtype

dtype of the underlying values in GenericIndex.

property empty

Indicator whether Index is empty.

True if Index is entirely empty (no items).

Returns
outbool

If Index is empty, return True, if not return False.

Examples

>>> import cudf
>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.empty
True
equals(other, **kwargs)

Determine if two Index objects contain the same elements.

Returns
out: bool

True if “other” is an Index and it has the same elements as calling index; False otherwise.

exp()

Get the exponential of all elements, element-wise.

Exponential is the inverse of the log function, so that x.exp().log() = x

Returns
DataFrame/Series/Index

Result of the element-wise exponential.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.exp()
0    3.678794e-01
1    1.000000e+00
2    2.718282e+00
3    1.383117e+00
4    1.648721e+00
5    4.539993e-05
6    2.688117e+43
dtype: float64

exp operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.exp()
      first        second
0  0.367879      1.263644
1  0.000045      1.349859
2  1.648721  22026.465795

exp operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.exp()
Float64Index([0.36787944117144233,  1.4918246976412703,
              2.718281828459045, 1.0,  1.3498588075760032],
            dtype='float64')
factorize(na_sentinel=- 1)

Encode the input values as integer labels

See also

cudf.core.series.Series.factorize

Encode the input values of Series.

fillna(value, downcast=None)

Fill null values with the specified value.

Parameters
valuescalar

Scalar value to use to fill nulls. This value cannot be a list-likes.

downcastdict, default is None

This Parameter is currently NON-FUNCTIONAL.

Returns
filledIndex

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, None, 4])
>>> index
Int64Index([1, 2, null, 4], dtype='int64')
>>> index.fillna(3)
Int64Index([1, 2, 3, 4], dtype='int64')
find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

Returns
begin, end2-tuple of int

The starting index and the ending index. The last value occurs at end - 1 position.

classmethod from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

Parameters
arrayPyArrow Array/ChunkedArray

PyArrow Object which has to be converted to Index

Returns
cudf Index
Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pyarrow as pa
>>> cudf.Index.from_arrow(pa.array(["a", "b", None]))
StringIndex(['a' 'b' None], dtype='object')
classmethod from_pandas(index, nan_as_null=None)

Convert from a Pandas Index.

Parameters
indexPandas Index object

A Pandas Index object which has to be converted to cuDF Index.

nan_as_nullbool, Default None

If None/True, converts np.nan values to null values. If False, leaves np.nan values as is.

Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pandas as pd
>>> import numpy as np
>>> data = [10, 20, 30, np.nan]
>>> pdi = pd.Index(data)
>>> cudf.core.index.Index.from_pandas(pdi)
Float64Index([10.0, 20.0, 30.0, <NA>], dtype='float64')
>>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False)
Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
get_level_values(level)

Return an Index of values for requested level.

This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.

Parameters
levelint or str

It is either the integer position or the name of the level.

Returns
Index

Calling object, as there is only one level in the Index.

See also

cudf.core.multiindex.MultiIndex.get_level_values

Get values for a level of a MultiIndex.

Notes

For Index, level should be 0, since there are no multiple levels.

Examples

>>> import cudf
>>> idx = cudf.Index(["a", "b", "c"])
>>> idx.get_level_values(0)
StringIndex(['a' 'b' 'c'], dtype='object')
get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if side=='right') position of given label.

Parameters
labelobject
side{‘left’, ‘right’}
kind{‘ix’, ‘loc’, ‘getitem’}
Returns
int

Index of label.

property gpu_values

View the data as a numba device array object

interleave_columns()

Interleave Series columns of a table into a single column.

Converts the column major table cols into a row major column.

Parameters
colsinput Table containing columns to interleave.
Returns
The interleaved columns as a single column

Examples

>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']])
>>> df
0    [A1, A2, A3]
1    [B1, B2, B3]
>>> df.interleave_columns()
0    A1
1    B1
2    A2
3    B2
4    A3
5    B3
property is_monotonic

Alias for is_monotonic_increasing.

property is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

property is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

property is_unique

Return if the index has unique values.

isin(values)

Return a boolean array where the index values are in values.

Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.

Parameters
valuesset, list-like, Index

Sought values.

Returns
is_containedcupy array

CuPy array of boolean values.

Examples

>>> idx = cudf.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')

Check whether each index value in a list of values.

>>> idx.isin([1, 4])
array([ True, False, False])
isna()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
isnull()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
join(other, how='left', level=None, return_indexers=False, sort=False)

Compute join_index and indexers to conform data structures to the new index.

Parameters
otherIndex.
how{‘left’, ‘right’, ‘inner’, ‘outer’}
return_indexersbool, default False
sortbool, default False

Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).

Returns: index

Examples

>>> import cudf
>>> lhs = cudf.DataFrame(
...     {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b']
... ).index
>>> lhs
MultiIndex([(2, 3),
            (3, 4),
            (1, 2)],
           names=['a', 'b'])
>>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index
>>> rhs
Int64Index([1, 4, 3], dtype='int64', name='a')
>>> lhs.join(rhs, how='inner')
MultiIndex([(3, 4),
            (1, 2)],
           names=['a', 'b'])
log()

Get the natural logarithm of all elements, element-wise.

Natural logarithm is the inverse of the exp function, so that x.log().exp() = x

Returns
DataFrame/Series/Index

Result of the element-wise natural logarithm.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.log()
0         NaN
1        -inf
2    0.000000
3   -1.125963
4   -0.693147
5         NaN
6    4.605170
dtype: float64

log operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.log()
      first    second
0       NaN -1.452434
1       NaN -1.203973
2 -0.693147  2.302585

log operation on Index:

>>> index = cudf.Index([10, 11, 500.0])
>>> index
Float64Index([10.0, 11.0, 500.0], dtype='float64')
>>> index.log()
Float64Index([2.302585092994046, 2.3978952727983707,
            6.214608098422191], dtype='float64')
mask(cond, other=None, inplace=False)

Replace values where the condition is True.

Parameters
condbool Series/DataFrame, array-like

Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.

other: scalar, list of scalars, Series/DataFrame

Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.

DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.

Series expects only scalar or series like with same length

inplacebool, default False

Whether to perform the operation in place on the data.

Returns
Same type as caller

Examples

>>> import cudf
>>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]})
>>> df.mask(df % 2 == 0, [-1, -1])
   A  B
0  1  3
1 -1  5
2  5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.mask(ser > 2, 10)
0    10
1    10
2     2
3     1
4     0
dtype: int64
>>> ser.mask(ser > 2)
0    <NA>
1    <NA>
2       2
3       1
4       0
dtype: int64
max()

Return the maximum value of the Index.

Returns
scalar

Maximum value.

See also

cudf.core.index.Index.min

Return the minimum value in an Index.

cudf.core.series.Series.max

Return the maximum value in a Series.

cudf.core.dataframe.DataFrame.max

Return the maximum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.max()
3
memory_usage(deep=False)

Memory usage of the values.

Parameters
deepbool

Introspect the data deeply, interrogate object dtypes for system-level memory consumption.

Returns
bytes used
min()

Return the minimum value of the Index.

Returns
scalar

Minimum value.

See also

cudf.core.index.Index.max

Return the maximum value in an Index.

cudf.core.series.Series.min

Return the minimum value in a Series.

cudf.core.dataframe.DataFrame.min

Return the minimum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.min()
1
property name

Returns the name of the Index.

property names

Returns a tuple containing the name of the Index.

property ndim

Dimension of the data. Apart from MultiIndex ndim is always 1.

property nlevels

Number of levels.

notna()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
notnull()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

Parameters
funcfunction

Function to apply to the Series/DataFrame/Index. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame/Index.

argsiterable, optional

Positional arguments passed into func.

kwargsmapping, optional

A dictionary of keyword arguments passed into func.

Returns
objectthe return type of func.

Examples

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. Instead of writing

>>> func(g(h(df), arg1=a), arg2=b, arg3=c)

You can write

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe(func, arg2=b, arg3=c)
... )

If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose f takes its data as arg2:

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe((func, 'arg2'), arg1=a, arg3=c)
...  )
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)

Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.

Parameters
axis{0 or ‘index’, 1 or ‘columns’}, default 0

Index to direct ranking.

method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’

How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.

numeric_onlybool, optional

For DataFrame objects, rank only numeric columns if set to True.

na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’

How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.

ascendingbool, default True

Whether or not the elements should be ranked in ascending order.

pctbool, default False

Whether or not to display the returned rankings in percentile form.

Returns
same type as caller

Return a Series or DataFrame with data ranks as values.

rename(name, inplace=False)

Alter Index name.

Defaults to returning new index.

Parameters
namelabel

Name(s) to set.

Returns
Index

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3], name='one')
>>> index
Int64Index([1, 2, 3], dtype='int64', name='one')
>>> index.name
'one'
>>> renamed_index = index.rename('two')
>>> renamed_index
Int64Index([1, 2, 3], dtype='int64', name='two')
>>> renamed_index.name
'two'
repeat(repeats, axis=None)

Repeats elements consecutively.

Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.

Parameters
repeatsint, or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.

Returns
Series/DataFrame/Index

A newly created object of same type as caller with repeated elements.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]})
>>> df
   a   b
0  1  10
1  2  20
2  3  30
>>> df.repeat(3)
   a   b
0  1  10
0  1  10
0  1  10
1  2  20
1  2  20
1  2  20
2  3  30
2  3  30
2  3  30

Repeat on Series

>>> s = cudf.Series([0, 2])
>>> s
0    0
1    2
dtype: int64
>>> s.repeat([3, 4])
0    0
0    0
0    0
1    2
1    2
1    2
1    2
dtype: int64
>>> s.repeat(2)
0    0
0    0
1    2
1    2
dtype: int64

Repeat on Index

>>> index = cudf.Index([10, 22, 33, 55])
>>> index
Int64Index([10, 22, 33, 55], dtype='int64')
>>> index.repeat(5)
Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33,
            33, 33, 33, 33, 55, 55, 55, 55, 55],
        dtype='int64')
round(decimals=0)

Round a DataFrame to a variable number of decimal places.

Parameters
decimalsint, dict, Series

Number of decimal places to round each column to. If an int is given, round each column to the same number of places. Otherwise dict and Series round to variable numbers of places. Column names should be in the keys if decimals is a dict-like, or in the index if decimals is a Series. Any columns not included in decimals will be left as is. Elements of decimals which are not columns of the input will be ignored.

Returns
DataFrame

A DataFrame with the affected columns rounded to the specified number of decimal places.

Examples

>>> df = cudf.DataFrame(
        [(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
...     columns=['dogs', 'cats']
... )
>>> df
    dogs  cats
0  0.21  0.32
1  0.01  0.67
2  0.66  0.03
3  0.21  0.18

By providing an integer each column is rounded to the same number of decimal places

>>> df.round(1)
    dogs  cats
0   0.2   0.3
1   0.0   0.7
2   0.7   0.0
3   0.2   0.2

With a dict, the number of places for specific columns can be specified with the column names as key and the number of decimal places as value

>>> df.round({'dogs': 1, 'cats': 0})
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

Using a Series, the number of places for specific columns can be specified with the column names as index and the number of decimal places as value

>>> decimals = cudf.Series([0, 1], index=['cats', 'dogs'])
>>> df.round(decimals)
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters
nint, optional

Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

fracfloat, optional

Fraction of axis items to return. Cannot be used with n.

replacebool, default False

Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”

weightsstr or ndarray-like, optional

Only supported for axis=1/”columns”

random_stateint, numpy RandomState or None, default None

Seed for the random number generator (if int), or None. If None, a random seed will be chosen. if RandomState, seed will be extracted from current state.

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.

Returns
Series or DataFrame or Index

A new object of same type as caller containing n items randomly sampled from the caller object.

Examples

>>> import cudf as cudf
>>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}})
>>> df.sample(3)
   a
1  2
3  4
0  1
>>> sr = cudf.Series([1, 2, 3, 4, 5])
>>> sr.sample(10, replace=True)
1    4
3    1
2    4
0    5
0    1
4    5
4    1
0    2
0    3
3    2
dtype: int64
>>> df = cudf.DataFrame(
... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]})
>>> df.sample(2, axis=1)
   a  c
0  1  3
1  2  4
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)

Scatter to a list of dataframes.

Uses map_index to determine the destination of each row of the original DataFrame.

Parameters
map_indexSeries, str or list-like

Scatter assignment for each row

map_sizeint

Length of output list. Must be >= uniques in map_index

keep_indexbool

Conserve original index values for each row

Returns
A list of cudf.DataFrame objects.
searchsorted(values, side='left', ascending=True, na_position='last')

Find indices where elements should be inserted to maintain order

Parameters
valueFrame (Shape must be consistent with self)

Values to be hypothetically inserted into Self

sidestr {‘left’, ‘right’} optional, default ‘left‘

If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index

ascendingbool optional, default True

Sorted Frame is in ascending order (otherwise descending)

na_positionstr {‘last’, ‘first’} optional, default ‘last‘

Position of null values in sorted order

Returns
1-D cupy array of insertion points

Examples

>>> s = cudf.Series([1, 2, 3])
>>> s.searchsorted(4)
3
>>> s.searchsorted([0, 4])
array([0, 3], dtype=int32)
>>> s.searchsorted([1, 3], side='left')
array([0, 2], dtype=int32)
>>> s.searchsorted([1, 3], side='right')
array([1, 3], dtype=int32)

If the values are not monotonically sorted, wrong locations may be returned:

>>> s = cudf.Series([2, 1, 3])
>>> s.searchsorted(1)
0   # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]})
>>> df
   a   b
0  1  10
1  3  12
2  5  14
3  7  16
>>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6],
... 'b': [10, 11, 13, 15]})
>>> values_df
   a   b
0  0  10
1  2  17
2  5  13
3  6  15
>>> df.searchsorted(values_df, ascending=False)
array([4, 4, 4, 0], dtype=int32)
set_names(names, level=None, inplace=False)

Set Index or MultiIndex name. Able to set new names partially and by level.

Parameters
nameslabel or list of label

Name(s) to set.

levelint, label or list of int or label, optional

If the index is a MultiIndex, level(s) to set (None for all levels). Otherwise level must be None.

inplacebool, default False

Modifies the object directly, instead of creating a new Index or MultiIndex.

Returns
Index

The same type as the caller or None if inplace is True.

See also

cudf.core.index.Index.rename

Able to set new names without level.

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = cudf.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
            ('python', 2019),
            ( 'cobra', 2018),
            ( 'cobra', 2019)],
           )
>>> idx.names
FrozenList([None, None])
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx.names
FrozenList(['kind', 'year'])
>>> idx.set_names('species', level=0, inplace=True)
>>> idx.names
FrozenList(['species', 'year'])
property shape

Returns a tuple representing the dimensionality of the Index.

shift(periods=1, freq=None, axis=0, fill_value=None)

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.sin()
0    0.000000
1    0.318683
2    0.479426
3    0.850904
4    0.893997
5   -0.801153
6    0.958916
dtype: float64

sin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.sin()
      first    second
0  0.000000 -0.506366
1 -0.958924  0.958916
2 -0.544021 -0.544072
3  0.650288 -0.999756

sin operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.sin()
Float64Index([-0.3894183423086505, -0.5063656411097588,
            0.8011526357338306, 0.8939966636005579],
            dtype='float64')
property size

Return the number of elements in the underlying data.

Returns
sizeSize of the DataFrame / Index / Series / MultiIndex

Examples

Size of an empty dataframe is 0.

>>> import cudf
>>> df = cudf.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>> df.size
0
>>> df = cudf.DataFrame(index=[1, 2, 3])
>>> df
Empty DataFrame
Columns: []
Index: [1, 2, 3]
>>> df.size
0

DataFrame with values

>>> df = cudf.DataFrame({'a': [10, 11, 12],
...         'b': ['hello', 'rapids', 'ai']})
>>> df
    a       b
0  10   hello
1  11  rapids
2  12      ai
>>> df.size
6
>>> df.index
RangeIndex(start=0, stop=3)
>>> df.index.size
3

Size of an Index

>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.size
0
>>> index = cudf.Index([1, 2, 3, 10])
>>> index
Int64Index([1, 2, 3, 10], dtype='int64')
>>> index.size
4

Size of a MultiIndex

>>> midx = cudf.MultiIndex(
...                 levels=[["a", "b", "c", None], ["1", None, "5"]],
...                 codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...                 names=["x", "y"],
...             )
>>> midx
MultiIndex([( 'a',  '1'),
            ( 'a',  '5'),
            ( 'b', <NA>),
            ( 'c', <NA>),
            (<NA>,  '1')],
           names=['x', 'y'])
>>> midx.size
5
sort_values(return_indexer=False, ascending=True, key=None)

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

Parameters
return_indexerbool, default False

Should the indices that would sort the index be returned.

ascendingbool, default True

Should the index values be sorted in an ascending order.

keyNone, optional

This parameter is NON-FUNCTIONAL.

Returns
sorted_indexIndex

Sorted copy of the index.

indexercupy.ndarray, optional

The indices that the index itself was sorted by.

See also

cudf.core.series.Series.min

Sort values of a Series.

cudf.core.dataframe.DataFrame.sort_values

Sort values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')

Sort values in ascending order (default behavior).

>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')

Sort values in descending order, and also get the indices idx was sorted by.

>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2],
                                                    dtype=int32))

Sorting values in a MultiIndex:

>>> midx = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> midx
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> midx.sort_values()
MultiIndex([(-10,  1),
            (  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11)],
           names=['x', 'y'])
>>> midx.sort_values(ascending=False)
MultiIndex([(  4, 11),
            (  3, 11),
            (  1,  5),
            (  1,  1),
            (-10,  1)],
           names=['x', 'y'])
sqrt()

Get the non-negative square-root of all elements, element-wise.

Returns
DataFrame/Series/Index

Result of the non-negative square-root of each element.

Examples

>>> import cudf
>>> import cudf
>>> ser = cudf.Series([10, 25, 81, 1.0, 100])
>>> ser
0     10.0
1     25.0
2     81.0
3      1.0
4    100.0
dtype: float64
>>> ser.sqrt()
0     3.162278
1     5.000000
2     9.000000
3     1.000000
4    10.000000
dtype: float64

sqrt operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-10.0, 100, 625],
...                      'second': [1, 2, 0.4]})
>>> df
   first  second
0  -10.0     1.0
1  100.0     2.0
2  625.0     0.4
>>> df.sqrt()
   first    second
0    NaN  1.000000
1   10.0  1.414214
2   25.0  0.632456

sqrt operation on Index:

>>> index = cudf.Index([-10.0, 100, 625])
>>> index
Float64Index([-10.0, 100.0, 625.0], dtype='float64')
>>> index.sqrt()
Float64Index([nan, 10.0, 25.0], dtype='float64')
sum()

Return the sum of all values of the Index.

Returns
scalar

Sum of all values.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.sum()
6
take(indices)

Gather only the specific subset of indices

Parameters
indices: An array-like that maps to values contained in this Index.
tan()

Get Trigonometric tangent, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.tan()
0    0.000000
1    0.336213
2    0.546302
3    1.619775
4   -1.995200
5    1.338690
6   -3.380140
dtype: float64

tan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.tan()
      first     second
0  0.000000  -0.587214
1 -3.380515  -3.380140
2  0.648361   0.648446
3 -0.855993  45.244742

tan operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.tan()
Float64Index([-0.4227932187381618,  -0.587213915156929,
            -1.3386902103511544, -1.995200412208242],
            dtype='float64')
tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

Parameters
selfinput Table containing columns to interleave.
countNumber of times to tile “rows”. Must be non-negative.
Returns
The table containing the tiled “rows”.

Examples

>>> df  = Dataframe([[8, 4, 7], [5, 2, 3]])
>>> count = 2
>>> df.tile(df, count)
   0  1  2
0  8  4  7
1  5  2  3
0  8  4  7
1  5  2  3
to_array(fillna=None)

Get a dense numpy array for the data.

Parameters
fillnastr or None

Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.

Notes

if fillna is None, null values are skipped. Therefore, the output size could be smaller.

to_arrow()

Convert Index to PyArrow Array

Returns
PyArrow Array

Examples

>>> import cudf
>>> ind = cudf.Index(["a", "b", None])
>>> ind.to_arrow()
<pyarrow.lib.StringArray object at 0x7f796b0e7750>
[
  "a",
  "b",
  null
]
to_dlpack()

Converts a cuDF object into a DLPack tensor.

DLPack is an open-source memory tensor structure: dmlc/dlpack.

This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.

Parameters
cudf_objDataFrame, Series, Index, or Column
Returns
pycapsule_objPyCapsule

Output DLPack tensor pointer which is encapsulated in a PyCapsule object.

to_frame(index=True, name=None)

Create a DataFrame with a column containing this Index

Parameters
indexboolean, default True

Set the index of the returned DataFrame as the original Index

namestr, default None

Name to be used for the column

Returns
DataFrame

cudf DataFrame

to_pandas()

Convert to a Pandas Index.

Examples

>>> import cudf
>>> idx = cudf.Index([-3, 10, 15, 20])
>>> idx
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> idx.to_pandas()
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> type(idx.to_pandas())
<class 'pandas.core.indexes.numeric.Int64Index'>
>>> type(idx)
<class 'cudf.core.index.GenericIndex'>
to_series(index=None, name=None)

Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.

Parameters
indexIndex, optional

Index of resulting Series. If None, defaults to original index.

namestr, optional

Dame of resulting Series. If None, defaults to name of original index.

Returns
Series

The dtype will be based on the type of the Index values.

unique()

Return unique values in the index.

Returns
Index without duplicates
property values

Return an array representing the data in the Index.

Returns
arrayA cupy array of data in the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values
array([  1, -10, 100,  20])
>>> type(index.values)
<class 'cupy.core.core.ndarray'>
property values_host

Return a numpy representation of the Index.

Only the values in the Index will be returned.

Returns
outnumpy.ndarray

The values of the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values_host
array([  1, -10, 100,  20])
>>> type(index.values_host)
<class 'numpy.ndarray'>
where(cond, other=None)

Replace values where the condition is False.

Parameters
condbool array-like with the same length as self

Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.

other: scalar, or array-like

Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.

Returns
Same type as caller

Examples

>>> import cudf
>>> index = cudf.Index([4, 3, 2, 1, 0])
>>> index
Int64Index([4, 3, 2, 1, 0], dtype='int64')
>>> index.where(index > 2, 15)
Int64Index([4, 3, 15, 15, 15], dtype='int64')

UInt64Index

class cudf.core.index.UInt64Index(data=None, dtype=None, copy=False, name=None)

Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.

Parameters
dataarray-like (1-dimensional)/ DataFrame

If it is a DataFrame, it will return a MultiIndex

dtypeNumPy dtype (default: object)

If dtype is None, we find the dtype that best fits the data.

copybool

Make a copy of input data.

nameobject

Name to be stored in the index.

tupleize_colsbool (default: True)

When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.

Returns
Index

cudf Index

Examples

>>> import cudf
>>> cudf.Index([1, 2, 3], dtype="uint64", name="a")
UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]}))
MultiIndex([(1, 2),
            (2, 3)],
          names=['a', 'b'])
Attributes
dtype

dtype of the underlying values in GenericIndex.

empty

Indicator whether Index is empty.

gpu_values

View the data as a numba device array object

is_monotonic

Alias for is_monotonic_increasing.

is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

is_unique

Return if the index has unique values.

name

Returns the name of the Index.

names

Returns a tuple containing the name of the Index.

ndim

Dimension of the data.

nlevels

Number of levels.

shape

Returns a tuple representing the dimensionality of the Index.

size

Return the number of elements in the underlying data.

values

Return an array representing the data in the Index.

values_host

Return a numpy representation of the Index.

Methods

acos()

Get Trigonometric inverse cosine, element-wise.

any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

argsort([ascending])

Return the integer indices that would sort the index.

asin()

Get Trigonometric inverse sine, element-wise.

astype(dtype[, copy])

Create an Index with values cast to dtypes.

atan()

Get Trigonometric inverse tangent, element-wise.

clip([lower, upper, inplace, axis])

Trim values at input threshold(s).

copy([name, deep, dtype, names])

Make a copy of this object.

cos()

Get Trigonometric cosine, element-wise.

difference(other[, sort])

Return a new Index with elements from the index that are not in other.

drop_duplicates([keep])

Return Index with duplicate values removed

dropna([how])

Return an Index with null values removed.

equals(other, **kwargs)

Determine if two Index objects contain the same elements.

exp()

Get the exponential of all elements, element-wise.

factorize([na_sentinel])

Encode the input values as integer labels

fillna(value[, downcast])

Fill null values with the specified value.

find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

from_pandas(index[, nan_as_null])

Convert from a Pandas Index.

get_level_values(level)

Return an Index of values for requested level.

get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label.

interleave_columns()

Interleave Series columns of a table into a single column.

isin(values)

Return a boolean array where the index values are in values.

isna()

Identify missing values.

isnull()

Identify missing values.

join(other[, how, level, return_indexers, sort])

Compute join_index and indexers to conform data structures to the new index.

log()

Get the natural logarithm of all elements, element-wise.

mask(cond[, other, inplace])

Replace values where the condition is True.

max()

Return the maximum value of the Index.

memory_usage([deep])

Memory usage of the values.

min()

Return the minimum value of the Index.

notna()

Identify non-missing values.

notnull()

Identify non-missing values.

pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

rank([axis, method, numeric_only, …])

Compute numerical data ranks (1 through n) along axis.

rename(name[, inplace])

Alter Index name.

repeat(repeats[, axis])

Repeats elements consecutively.

round([decimals])

Round a DataFrame to a variable number of decimal places.

sample([n, frac, replace, weights, …])

Return a random sample of items from an axis of object.

scatter_by_map(map_index[, map_size, keep_index])

Scatter to a list of dataframes.

searchsorted(values[, side, ascending, …])

Find indices where elements should be inserted to maintain order

set_names(names[, level, inplace])

Set Index or MultiIndex name.

shift([periods, freq, axis, fill_value])

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

sort_values([return_indexer, ascending, key])

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

sqrt()

Get the non-negative square-root of all elements, element-wise.

sum()

Return the sum of all values of the Index.

take(indices)

Gather only the specific subset of indices

tan()

Get Trigonometric tangent, element-wise.

tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

to_array([fillna])

Get a dense numpy array for the data.

to_arrow()

Convert Index to PyArrow Array

to_dlpack()

Converts a cuDF object into a DLPack tensor.

to_frame([index, name])

Create a DataFrame with a column containing this Index

to_pandas()

Convert to a Pandas Index.

to_series([index, name])

Create a Series with both index and values equal to the index keys.

unique()

Return unique values in the index.

where(cond[, other])

Replace values where the condition is False.

replace

acos()

Get Trigonometric inverse cosine, element-wise.

The inverse of cos so that, if y = x.cos(), then x = y.acos()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.acos()
0    3.141593
1    1.570796
2    0.000000
3    1.240482
4    1.047198
dtype: float64

acos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.acos()
      first    second
0  3.141593  1.334606
1  1.570796  1.266104
2  1.047198  1.470629

acos operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.acos()
Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0,
            1.5707963267948966,  1.266103672779499],
            dtype='float64')
any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

Parameters
otherIndex or list/tuple of indices
Returns
appendedIndex

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 10, 100])
>>> idx
Int64Index([1, 2, 10, 100], dtype='int64')
>>> other = cudf.Index([200, 400, 50])
>>> other
Int64Index([200, 400, 50], dtype='int64')
>>> idx.append(other)
Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')

append accepts list of Index objects

>>> idx.append([other, other])
Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
argsort(ascending=True, **kwargs)

Return the integer indices that would sort the index.

Parameters
ascendingbool, default True

If True, returns the indices for ascending order. If False, returns the indices for descending order.

Returns
arrayA cupy array containing Integer indices that

would sort the index if used as an indexer.

Examples

>>> import cudf
>>> index = cudf.Index([10, 100, 1, 1000])
>>> index
Int64Index([10, 100, 1, 1000], dtype='int64')
>>> index.argsort()
array([2, 0, 1, 3], dtype=int32)

The order of argsort can be reversed using ascending parameter, by setting it to False. >>> index.argsort(ascending=False) array([3, 1, 0, 2], dtype=int32)

argsort on a MultiIndex:

>>> index = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> index
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> index.argsort()
array([4, 0, 1, 2, 3], dtype=int32)
>>> index.argsort(ascending=False)
array([3, 2, 1, 0, 4], dtype=int32)
asin()

Get Trigonometric inverse sine, element-wise.

The inverse of sine so that, if y = x.sin(), then x = y.asin()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.asin()
0   -1.570796
1    0.000000
2    1.570796
3    0.330314
4    0.523599
dtype: float64

asin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.asin()
      first    second
0 -1.570796  0.236190
1  0.000000  0.304693
2  0.523599  0.100167

asin operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64')
>>> index.asin()
Float64Index([-1.5707963267948966, 0.41151684606748806,
            1.5707963267948966, 0.3046926540153975],
            dtype='float64')
astype(dtype, copy=False)

Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.

Parameters
dtypenumpy dtype

Use a numpy.dtype to cast entire Index object to.

copybool, default False

By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.

Returns
Index

Index with values cast to specified dtype.

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3])
>>> index
Int64Index([1, 2, 3], dtype='int64')
>>> index.astype('float64')
Float64Index([1.0, 2.0, 3.0], dtype='float64')
atan()

Get Trigonometric inverse tangent, element-wise.

The inverse of tan so that, if y = x.tan(), then x = y.atan()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10])
>>> ser
0    -1.00000
1     0.00000
2     1.00000
3     0.32434
4     0.50000
5   -10.00000
dtype: float64
>>> ser.atan()
0   -0.785398
1    0.000000
2    0.785398
3    0.313635
4    0.463648
5   -1.471128
dtype: float64

atan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.atan()
      first    second
0 -0.785398  0.229864
1 -1.471128  0.291457
2  0.463648  1.471128

atan operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.atan()
Float64Index([-0.7853981633974483,  0.3805063771123649,
                            0.7853981633974483, 0.0,
                            0.2914567944778671],
            dtype='float64')
clip(lower=None, upper=None, inplace=False, axis=1)

Trim values at input threshold(s).

Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.

Parameters
lowerscalar or array_like, default None

Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.

upperscalar or array_like, default None

Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.

inplacebool, default False
Returns
Clipped DataFrame/Series/Index/MultiIndex

Examples

>>> import cudf
>>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']})
>>> df.clip(lower=[2, 'b'], upper=[3, 'c'])
   a  b
0  2  b
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=None, upper=[3, 'c'])
   a  b
0  1  a
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=[2, 'b'], upper=None)
   a  b
0  2  b
1  2  b
2  3  c
3  4  d
>>> df.clip(lower=2, upper=3, inplace=True)
>>> df
   a  b
0  2  2
1  2  3
2  3  3
3  3  3
>>> import cudf
>>> sr = cudf.Series([1, 2, 3, 4])
>>> sr.clip(lower=2, upper=3)
0    2
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=None, upper=3)
0    1
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True)
>>> sr
0    2
1    2
2    3
3    4
dtype: int64
copy(name=None, deep=False, dtype=None, names=None)

Make a copy of this object.

Parameters
nameobject, default None

Name of index, use original name when None

deepbool, default True

Make a deep copy of the data. With deep=False the original data is used

dtypenumpy dtype, default None

Target datatype to cast into, use original dtype when None

nameslist-like, default False

Kept compatibility with MultiIndex. Should not be used.

Returns
New index instance, casted to new dtype
cos()

Get Trigonometric cosine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.cos()
0    1.000000
1    0.947861
2    0.877583
3    0.525322
4   -0.448074
5   -0.598460
6   -0.283691
dtype: float64

cos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.cos()
      first    second
0  1.000000  0.862319
1  0.283662 -0.283691
2 -0.839072 -0.839039
3 -0.759688 -0.022097

cos operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.cos()
Float64Index([ 0.9210609940028851,  0.8623188722876839,
            -0.5984600690578581, -0.4480736161291701],
            dtype='float64')
difference(other, sort=None)

Return a new Index with elements from the index that are not in other.

This is the set difference of two Index objects.

Parameters
otherIndex or array-like
sortFalse or None, default None

Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.

  • None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.

  • False : Do not sort the result.

Returns
differenceIndex

Examples

>>> import cudf
>>> idx1 = cudf.Index([2, 1, 3, 4])
>>> idx1
Int64Index([2, 1, 3, 4], dtype='int64')
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx2
Int64Index([3, 4, 5, 6], dtype='int64')
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
drop_duplicates(keep='first')

Return Index with duplicate values removed

Parameters
keep{‘first’, ‘last’, False}, default ‘first’
  • ‘first’Drop duplicates except for the

    first occurrence.

  • ‘last’Drop duplicates except for the

    last occurrence.

  • False : Drop all duplicates.

Returns
deduplicatedIndex

Examples

>>> import cudf
>>> idx = cudf.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
>>> idx
StringIndex(['lama' 'cow' 'lama' 'beetle' 'lama' 'hippo'], dtype='object')
>>> idx.drop_duplicates()
StringIndex(['beetle' 'cow' 'hippo' 'lama'], dtype='object')
dropna(how='any')

Return an Index with null values removed.

Parameters
how{‘any’, ‘all’}, default ‘any’

If the Index is a MultiIndex, drop the value when any or all levels are NaN.

Returns
validIndex

Examples

>>> import cudf
>>> index = cudf.Index(['a', None, 'b', 'c'])
>>> index
StringIndex(['a' None 'b' 'c'], dtype='object')
>>> index.dropna()
StringIndex(['a' 'b' 'c'], dtype='object')

Using dropna on a MultiIndex:

>>> midx = cudf.MultiIndex(
...         levels=[[1, None, 4, None], [1, 2, 5]],
...         codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...         names=["x", "y"],
...     )
>>> midx
MultiIndex([(   1, 1),
            (   1, 5),
            (<NA>, 2),
            (   4, 2),
            (<NA>, 1)],
           names=['x', 'y'])
>>> midx.dropna()
MultiIndex([(1, 1),
            (1, 5),
            (4, 2)],
           names=['x', 'y'])
property dtype

dtype of the underlying values in GenericIndex.

property empty

Indicator whether Index is empty.

True if Index is entirely empty (no items).

Returns
outbool

If Index is empty, return True, if not return False.

Examples

>>> import cudf
>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.empty
True
equals(other, **kwargs)

Determine if two Index objects contain the same elements.

Returns
out: bool

True if “other” is an Index and it has the same elements as calling index; False otherwise.

exp()

Get the exponential of all elements, element-wise.

Exponential is the inverse of the log function, so that x.exp().log() = x

Returns
DataFrame/Series/Index

Result of the element-wise exponential.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.exp()
0    3.678794e-01
1    1.000000e+00
2    2.718282e+00
3    1.383117e+00
4    1.648721e+00
5    4.539993e-05
6    2.688117e+43
dtype: float64

exp operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.exp()
      first        second
0  0.367879      1.263644
1  0.000045      1.349859
2  1.648721  22026.465795

exp operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.exp()
Float64Index([0.36787944117144233,  1.4918246976412703,
              2.718281828459045, 1.0,  1.3498588075760032],
            dtype='float64')
factorize(na_sentinel=- 1)

Encode the input values as integer labels

See also

cudf.core.series.Series.factorize

Encode the input values of Series.

fillna(value, downcast=None)

Fill null values with the specified value.

Parameters
valuescalar

Scalar value to use to fill nulls. This value cannot be a list-likes.

downcastdict, default is None

This Parameter is currently NON-FUNCTIONAL.

Returns
filledIndex

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, None, 4])
>>> index
Int64Index([1, 2, null, 4], dtype='int64')
>>> index.fillna(3)
Int64Index([1, 2, 3, 4], dtype='int64')
find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

Returns
begin, end2-tuple of int

The starting index and the ending index. The last value occurs at end - 1 position.

classmethod from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

Parameters
arrayPyArrow Array/ChunkedArray

PyArrow Object which has to be converted to Index

Returns
cudf Index
Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pyarrow as pa
>>> cudf.Index.from_arrow(pa.array(["a", "b", None]))
StringIndex(['a' 'b' None], dtype='object')
classmethod from_pandas(index, nan_as_null=None)

Convert from a Pandas Index.

Parameters
indexPandas Index object

A Pandas Index object which has to be converted to cuDF Index.

nan_as_nullbool, Default None

If None/True, converts np.nan values to null values. If False, leaves np.nan values as is.

Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pandas as pd
>>> import numpy as np
>>> data = [10, 20, 30, np.nan]
>>> pdi = pd.Index(data)
>>> cudf.core.index.Index.from_pandas(pdi)
Float64Index([10.0, 20.0, 30.0, <NA>], dtype='float64')
>>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False)
Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
get_level_values(level)

Return an Index of values for requested level.

This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.

Parameters
levelint or str

It is either the integer position or the name of the level.

Returns
Index

Calling object, as there is only one level in the Index.

See also

cudf.core.multiindex.MultiIndex.get_level_values

Get values for a level of a MultiIndex.

Notes

For Index, level should be 0, since there are no multiple levels.

Examples

>>> import cudf
>>> idx = cudf.Index(["a", "b", "c"])
>>> idx.get_level_values(0)
StringIndex(['a' 'b' 'c'], dtype='object')
get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if side=='right') position of given label.

Parameters
labelobject
side{‘left’, ‘right’}
kind{‘ix’, ‘loc’, ‘getitem’}
Returns
int

Index of label.

property gpu_values

View the data as a numba device array object

interleave_columns()

Interleave Series columns of a table into a single column.

Converts the column major table cols into a row major column.

Parameters
colsinput Table containing columns to interleave.
Returns
The interleaved columns as a single column

Examples

>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']])
>>> df
0    [A1, A2, A3]
1    [B1, B2, B3]
>>> df.interleave_columns()
0    A1
1    B1
2    A2
3    B2
4    A3
5    B3
property is_monotonic

Alias for is_monotonic_increasing.

property is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

property is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

property is_unique

Return if the index has unique values.

isin(values)

Return a boolean array where the index values are in values.

Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.

Parameters
valuesset, list-like, Index

Sought values.

Returns
is_containedcupy array

CuPy array of boolean values.

Examples

>>> idx = cudf.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')

Check whether each index value in a list of values.

>>> idx.isin([1, 4])
array([ True, False, False])
isna()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
isnull()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
join(other, how='left', level=None, return_indexers=False, sort=False)

Compute join_index and indexers to conform data structures to the new index.

Parameters
otherIndex.
how{‘left’, ‘right’, ‘inner’, ‘outer’}
return_indexersbool, default False
sortbool, default False

Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).

Returns: index

Examples

>>> import cudf
>>> lhs = cudf.DataFrame(
...     {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b']
... ).index
>>> lhs
MultiIndex([(2, 3),
            (3, 4),
            (1, 2)],
           names=['a', 'b'])
>>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index
>>> rhs
Int64Index([1, 4, 3], dtype='int64', name='a')
>>> lhs.join(rhs, how='inner')
MultiIndex([(3, 4),
            (1, 2)],
           names=['a', 'b'])
log()

Get the natural logarithm of all elements, element-wise.

Natural logarithm is the inverse of the exp function, so that x.log().exp() = x

Returns
DataFrame/Series/Index

Result of the element-wise natural logarithm.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.log()
0         NaN
1        -inf
2    0.000000
3   -1.125963
4   -0.693147
5         NaN
6    4.605170
dtype: float64

log operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.log()
      first    second
0       NaN -1.452434
1       NaN -1.203973
2 -0.693147  2.302585

log operation on Index:

>>> index = cudf.Index([10, 11, 500.0])
>>> index
Float64Index([10.0, 11.0, 500.0], dtype='float64')
>>> index.log()
Float64Index([2.302585092994046, 2.3978952727983707,
            6.214608098422191], dtype='float64')
mask(cond, other=None, inplace=False)

Replace values where the condition is True.

Parameters
condbool Series/DataFrame, array-like

Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.

other: scalar, list of scalars, Series/DataFrame

Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.

DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.

Series expects only scalar or series like with same length

inplacebool, default False

Whether to perform the operation in place on the data.

Returns
Same type as caller

Examples

>>> import cudf
>>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]})
>>> df.mask(df % 2 == 0, [-1, -1])
   A  B
0  1  3
1 -1  5
2  5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.mask(ser > 2, 10)
0    10
1    10
2     2
3     1
4     0
dtype: int64
>>> ser.mask(ser > 2)
0    <NA>
1    <NA>
2       2
3       1
4       0
dtype: int64
max()

Return the maximum value of the Index.

Returns
scalar

Maximum value.

See also

cudf.core.index.Index.min

Return the minimum value in an Index.

cudf.core.series.Series.max

Return the maximum value in a Series.

cudf.core.dataframe.DataFrame.max

Return the maximum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.max()
3
memory_usage(deep=False)

Memory usage of the values.

Parameters
deepbool

Introspect the data deeply, interrogate object dtypes for system-level memory consumption.

Returns
bytes used
min()

Return the minimum value of the Index.

Returns
scalar

Minimum value.

See also

cudf.core.index.Index.max

Return the maximum value in an Index.

cudf.core.series.Series.min

Return the minimum value in a Series.

cudf.core.dataframe.DataFrame.min

Return the minimum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.min()
1
property name

Returns the name of the Index.

property names

Returns a tuple containing the name of the Index.

property ndim

Dimension of the data. Apart from MultiIndex ndim is always 1.

property nlevels

Number of levels.

notna()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
notnull()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

Parameters
funcfunction

Function to apply to the Series/DataFrame/Index. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame/Index.

argsiterable, optional

Positional arguments passed into func.

kwargsmapping, optional

A dictionary of keyword arguments passed into func.

Returns
objectthe return type of func.

Examples

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. Instead of writing

>>> func(g(h(df), arg1=a), arg2=b, arg3=c)

You can write

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe(func, arg2=b, arg3=c)
... )

If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose f takes its data as arg2:

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe((func, 'arg2'), arg1=a, arg3=c)
...  )
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)

Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.

Parameters
axis{0 or ‘index’, 1 or ‘columns’}, default 0

Index to direct ranking.

method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’

How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.

numeric_onlybool, optional

For DataFrame objects, rank only numeric columns if set to True.

na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’

How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.

ascendingbool, default True

Whether or not the elements should be ranked in ascending order.

pctbool, default False

Whether or not to display the returned rankings in percentile form.

Returns
same type as caller

Return a Series or DataFrame with data ranks as values.

rename(name, inplace=False)

Alter Index name.

Defaults to returning new index.

Parameters
namelabel

Name(s) to set.

Returns
Index

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3], name='one')
>>> index
Int64Index([1, 2, 3], dtype='int64', name='one')
>>> index.name
'one'
>>> renamed_index = index.rename('two')
>>> renamed_index
Int64Index([1, 2, 3], dtype='int64', name='two')
>>> renamed_index.name
'two'
repeat(repeats, axis=None)

Repeats elements consecutively.

Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.

Parameters
repeatsint, or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.

Returns
Series/DataFrame/Index

A newly created object of same type as caller with repeated elements.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]})
>>> df
   a   b
0  1  10
1  2  20
2  3  30
>>> df.repeat(3)
   a   b
0  1  10
0  1  10
0  1  10
1  2  20
1  2  20
1  2  20
2  3  30
2  3  30
2  3  30

Repeat on Series

>>> s = cudf.Series([0, 2])
>>> s
0    0
1    2
dtype: int64
>>> s.repeat([3, 4])
0    0
0    0
0    0
1    2
1    2
1    2
1    2
dtype: int64
>>> s.repeat(2)
0    0
0    0
1    2
1    2
dtype: int64

Repeat on Index

>>> index = cudf.Index([10, 22, 33, 55])
>>> index
Int64Index([10, 22, 33, 55], dtype='int64')
>>> index.repeat(5)
Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33,
            33, 33, 33, 33, 55, 55, 55, 55, 55],
        dtype='int64')
round(decimals=0)

Round a DataFrame to a variable number of decimal places.

Parameters
decimalsint, dict, Series

Number of decimal places to round each column to. If an int is given, round each column to the same number of places. Otherwise dict and Series round to variable numbers of places. Column names should be in the keys if decimals is a dict-like, or in the index if decimals is a Series. Any columns not included in decimals will be left as is. Elements of decimals which are not columns of the input will be ignored.

Returns
DataFrame

A DataFrame with the affected columns rounded to the specified number of decimal places.

Examples

>>> df = cudf.DataFrame(
        [(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
...     columns=['dogs', 'cats']
... )
>>> df
    dogs  cats
0  0.21  0.32
1  0.01  0.67
2  0.66  0.03
3  0.21  0.18

By providing an integer each column is rounded to the same number of decimal places

>>> df.round(1)
    dogs  cats
0   0.2   0.3
1   0.0   0.7
2   0.7   0.0
3   0.2   0.2

With a dict, the number of places for specific columns can be specified with the column names as key and the number of decimal places as value

>>> df.round({'dogs': 1, 'cats': 0})
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

Using a Series, the number of places for specific columns can be specified with the column names as index and the number of decimal places as value

>>> decimals = cudf.Series([0, 1], index=['cats', 'dogs'])
>>> df.round(decimals)
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters
nint, optional

Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

fracfloat, optional

Fraction of axis items to return. Cannot be used with n.

replacebool, default False

Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”

weightsstr or ndarray-like, optional

Only supported for axis=1/”columns”

random_stateint, numpy RandomState or None, default None

Seed for the random number generator (if int), or None. If None, a random seed will be chosen. if RandomState, seed will be extracted from current state.

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.

Returns
Series or DataFrame or Index

A new object of same type as caller containing n items randomly sampled from the caller object.

Examples

>>> import cudf as cudf
>>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}})
>>> df.sample(3)
   a
1  2
3  4
0  1
>>> sr = cudf.Series([1, 2, 3, 4, 5])
>>> sr.sample(10, replace=True)
1    4
3    1
2    4
0    5
0    1
4    5
4    1
0    2
0    3
3    2
dtype: int64
>>> df = cudf.DataFrame(
... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]})
>>> df.sample(2, axis=1)
   a  c
0  1  3
1  2  4
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)

Scatter to a list of dataframes.

Uses map_index to determine the destination of each row of the original DataFrame.

Parameters
map_indexSeries, str or list-like

Scatter assignment for each row

map_sizeint

Length of output list. Must be >= uniques in map_index

keep_indexbool

Conserve original index values for each row

Returns
A list of cudf.DataFrame objects.
searchsorted(values, side='left', ascending=True, na_position='last')

Find indices where elements should be inserted to maintain order

Parameters
valueFrame (Shape must be consistent with self)

Values to be hypothetically inserted into Self

sidestr {‘left’, ‘right’} optional, default ‘left‘

If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index

ascendingbool optional, default True

Sorted Frame is in ascending order (otherwise descending)

na_positionstr {‘last’, ‘first’} optional, default ‘last‘

Position of null values in sorted order

Returns
1-D cupy array of insertion points

Examples

>>> s = cudf.Series([1, 2, 3])
>>> s.searchsorted(4)
3
>>> s.searchsorted([0, 4])
array([0, 3], dtype=int32)
>>> s.searchsorted([1, 3], side='left')
array([0, 2], dtype=int32)
>>> s.searchsorted([1, 3], side='right')
array([1, 3], dtype=int32)

If the values are not monotonically sorted, wrong locations may be returned:

>>> s = cudf.Series([2, 1, 3])
>>> s.searchsorted(1)
0   # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]})
>>> df
   a   b
0  1  10
1  3  12
2  5  14
3  7  16
>>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6],
... 'b': [10, 11, 13, 15]})
>>> values_df
   a   b
0  0  10
1  2  17
2  5  13
3  6  15
>>> df.searchsorted(values_df, ascending=False)
array([4, 4, 4, 0], dtype=int32)
set_names(names, level=None, inplace=False)

Set Index or MultiIndex name. Able to set new names partially and by level.

Parameters
nameslabel or list of label

Name(s) to set.

levelint, label or list of int or label, optional

If the index is a MultiIndex, level(s) to set (None for all levels). Otherwise level must be None.

inplacebool, default False

Modifies the object directly, instead of creating a new Index or MultiIndex.

Returns
Index

The same type as the caller or None if inplace is True.

See also

cudf.core.index.Index.rename

Able to set new names without level.

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = cudf.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
            ('python', 2019),
            ( 'cobra', 2018),
            ( 'cobra', 2019)],
           )
>>> idx.names
FrozenList([None, None])
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx.names
FrozenList(['kind', 'year'])
>>> idx.set_names('species', level=0, inplace=True)
>>> idx.names
FrozenList(['species', 'year'])
property shape

Returns a tuple representing the dimensionality of the Index.

shift(periods=1, freq=None, axis=0, fill_value=None)

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.sin()
0    0.000000
1    0.318683
2    0.479426
3    0.850904
4    0.893997
5   -0.801153
6    0.958916
dtype: float64

sin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.sin()
      first    second
0  0.000000 -0.506366
1 -0.958924  0.958916
2 -0.544021 -0.544072
3  0.650288 -0.999756

sin operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.sin()
Float64Index([-0.3894183423086505, -0.5063656411097588,
            0.8011526357338306, 0.8939966636005579],
            dtype='float64')
property size

Return the number of elements in the underlying data.

Returns
sizeSize of the DataFrame / Index / Series / MultiIndex

Examples

Size of an empty dataframe is 0.

>>> import cudf
>>> df = cudf.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>> df.size
0
>>> df = cudf.DataFrame(index=[1, 2, 3])
>>> df
Empty DataFrame
Columns: []
Index: [1, 2, 3]
>>> df.size
0

DataFrame with values

>>> df = cudf.DataFrame({'a': [10, 11, 12],
...         'b': ['hello', 'rapids', 'ai']})
>>> df
    a       b
0  10   hello
1  11  rapids
2  12      ai
>>> df.size
6
>>> df.index
RangeIndex(start=0, stop=3)
>>> df.index.size
3

Size of an Index

>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.size
0
>>> index = cudf.Index([1, 2, 3, 10])
>>> index
Int64Index([1, 2, 3, 10], dtype='int64')
>>> index.size
4

Size of a MultiIndex

>>> midx = cudf.MultiIndex(
...                 levels=[["a", "b", "c", None], ["1", None, "5"]],
...                 codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...                 names=["x", "y"],
...             )
>>> midx
MultiIndex([( 'a',  '1'),
            ( 'a',  '5'),
            ( 'b', <NA>),
            ( 'c', <NA>),
            (<NA>,  '1')],
           names=['x', 'y'])
>>> midx.size
5
sort_values(return_indexer=False, ascending=True, key=None)

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

Parameters
return_indexerbool, default False

Should the indices that would sort the index be returned.

ascendingbool, default True

Should the index values be sorted in an ascending order.

keyNone, optional

This parameter is NON-FUNCTIONAL.

Returns
sorted_indexIndex

Sorted copy of the index.

indexercupy.ndarray, optional

The indices that the index itself was sorted by.

See also

cudf.core.series.Series.min

Sort values of a Series.

cudf.core.dataframe.DataFrame.sort_values

Sort values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')

Sort values in ascending order (default behavior).

>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')

Sort values in descending order, and also get the indices idx was sorted by.

>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2],
                                                    dtype=int32))

Sorting values in a MultiIndex:

>>> midx = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> midx
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> midx.sort_values()
MultiIndex([(-10,  1),
            (  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11)],
           names=['x', 'y'])
>>> midx.sort_values(ascending=False)
MultiIndex([(  4, 11),
            (  3, 11),
            (  1,  5),
            (  1,  1),
            (-10,  1)],
           names=['x', 'y'])
sqrt()

Get the non-negative square-root of all elements, element-wise.

Returns
DataFrame/Series/Index

Result of the non-negative square-root of each element.

Examples

>>> import cudf
>>> import cudf
>>> ser = cudf.Series([10, 25, 81, 1.0, 100])
>>> ser
0     10.0
1     25.0
2     81.0
3      1.0
4    100.0
dtype: float64
>>> ser.sqrt()
0     3.162278
1     5.000000
2     9.000000
3     1.000000
4    10.000000
dtype: float64

sqrt operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-10.0, 100, 625],
...                      'second': [1, 2, 0.4]})
>>> df
   first  second
0  -10.0     1.0
1  100.0     2.0
2  625.0     0.4
>>> df.sqrt()
   first    second
0    NaN  1.000000
1   10.0  1.414214
2   25.0  0.632456

sqrt operation on Index:

>>> index = cudf.Index([-10.0, 100, 625])
>>> index
Float64Index([-10.0, 100.0, 625.0], dtype='float64')
>>> index.sqrt()
Float64Index([nan, 10.0, 25.0], dtype='float64')
sum()

Return the sum of all values of the Index.

Returns
scalar

Sum of all values.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.sum()
6
take(indices)

Gather only the specific subset of indices

Parameters
indices: An array-like that maps to values contained in this Index.
tan()

Get Trigonometric tangent, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.tan()
0    0.000000
1    0.336213
2    0.546302
3    1.619775
4   -1.995200
5    1.338690
6   -3.380140
dtype: float64

tan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.tan()
      first     second
0  0.000000  -0.587214
1 -3.380515  -3.380140
2  0.648361   0.648446
3 -0.855993  45.244742

tan operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.tan()
Float64Index([-0.4227932187381618,  -0.587213915156929,
            -1.3386902103511544, -1.995200412208242],
            dtype='float64')
tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

Parameters
selfinput Table containing columns to interleave.
countNumber of times to tile “rows”. Must be non-negative.
Returns
The table containing the tiled “rows”.

Examples

>>> df  = Dataframe([[8, 4, 7], [5, 2, 3]])
>>> count = 2
>>> df.tile(df, count)
   0  1  2
0  8  4  7
1  5  2  3
0  8  4  7
1  5  2  3
to_array(fillna=None)

Get a dense numpy array for the data.

Parameters
fillnastr or None

Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.

Notes

if fillna is None, null values are skipped. Therefore, the output size could be smaller.

to_arrow()

Convert Index to PyArrow Array

Returns
PyArrow Array

Examples

>>> import cudf
>>> ind = cudf.Index(["a", "b", None])
>>> ind.to_arrow()
<pyarrow.lib.StringArray object at 0x7f796b0e7750>
[
  "a",
  "b",
  null
]
to_dlpack()

Converts a cuDF object into a DLPack tensor.

DLPack is an open-source memory tensor structure: dmlc/dlpack.

This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.

Parameters
cudf_objDataFrame, Series, Index, or Column
Returns
pycapsule_objPyCapsule

Output DLPack tensor pointer which is encapsulated in a PyCapsule object.

to_frame(index=True, name=None)

Create a DataFrame with a column containing this Index

Parameters
indexboolean, default True

Set the index of the returned DataFrame as the original Index

namestr, default None

Name to be used for the column

Returns
DataFrame

cudf DataFrame

to_pandas()

Convert to a Pandas Index.

Examples

>>> import cudf
>>> idx = cudf.Index([-3, 10, 15, 20])
>>> idx
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> idx.to_pandas()
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> type(idx.to_pandas())
<class 'pandas.core.indexes.numeric.Int64Index'>
>>> type(idx)
<class 'cudf.core.index.GenericIndex'>
to_series(index=None, name=None)

Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.

Parameters
indexIndex, optional

Index of resulting Series. If None, defaults to original index.

namestr, optional

Dame of resulting Series. If None, defaults to name of original index.

Returns
Series

The dtype will be based on the type of the Index values.

unique()

Return unique values in the index.

Returns
Index without duplicates
property values

Return an array representing the data in the Index.

Returns
arrayA cupy array of data in the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values
array([  1, -10, 100,  20])
>>> type(index.values)
<class 'cupy.core.core.ndarray'>
property values_host

Return a numpy representation of the Index.

Only the values in the Index will be returned.

Returns
outnumpy.ndarray

The values of the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values_host
array([  1, -10, 100,  20])
>>> type(index.values_host)
<class 'numpy.ndarray'>
where(cond, other=None)

Replace values where the condition is False.

Parameters
condbool array-like with the same length as self

Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.

other: scalar, or array-like

Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.

Returns
Same type as caller

Examples

>>> import cudf
>>> index = cudf.Index([4, 3, 2, 1, 0])
>>> index
Int64Index([4, 3, 2, 1, 0], dtype='int64')
>>> index.where(index > 2, 15)
Int64Index([4, 3, 15, 15, 15], dtype='int64')

Float32Index

class cudf.core.index.Float32Index(data=None, dtype=None, copy=False, name=None)

Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.

Parameters
dataarray-like (1-dimensional)/ DataFrame

If it is a DataFrame, it will return a MultiIndex

dtypeNumPy dtype (default: object)

If dtype is None, we find the dtype that best fits the data.

copybool

Make a copy of input data.

nameobject

Name to be stored in the index.

tupleize_colsbool (default: True)

When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.

Returns
Index

cudf Index

Examples

>>> import cudf
>>> cudf.Index([1, 2, 3], dtype="uint64", name="a")
UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]}))
MultiIndex([(1, 2),
            (2, 3)],
          names=['a', 'b'])
Attributes
dtype

dtype of the underlying values in GenericIndex.

empty

Indicator whether Index is empty.

gpu_values

View the data as a numba device array object

is_monotonic

Alias for is_monotonic_increasing.

is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

is_unique

Return if the index has unique values.

name

Returns the name of the Index.

names

Returns a tuple containing the name of the Index.

ndim

Dimension of the data.

nlevels

Number of levels.

shape

Returns a tuple representing the dimensionality of the Index.

size

Return the number of elements in the underlying data.

values

Return an array representing the data in the Index.

values_host

Return a numpy representation of the Index.

Methods

acos()

Get Trigonometric inverse cosine, element-wise.

any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

argsort([ascending])

Return the integer indices that would sort the index.

asin()

Get Trigonometric inverse sine, element-wise.

astype(dtype[, copy])

Create an Index with values cast to dtypes.

atan()

Get Trigonometric inverse tangent, element-wise.

clip([lower, upper, inplace, axis])

Trim values at input threshold(s).

copy([name, deep, dtype, names])

Make a copy of this object.

cos()

Get Trigonometric cosine, element-wise.

difference(other[, sort])

Return a new Index with elements from the index that are not in other.

drop_duplicates([keep])

Return Index with duplicate values removed

dropna([how])

Return an Index with null values removed.

equals(other, **kwargs)

Determine if two Index objects contain the same elements.

exp()

Get the exponential of all elements, element-wise.

factorize([na_sentinel])

Encode the input values as integer labels

fillna(value[, downcast])

Fill null values with the specified value.

find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

from_pandas(index[, nan_as_null])

Convert from a Pandas Index.

get_level_values(level)

Return an Index of values for requested level.

get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label.

interleave_columns()

Interleave Series columns of a table into a single column.

isin(values)

Return a boolean array where the index values are in values.

isna()

Identify missing values.

isnull()

Identify missing values.

join(other[, how, level, return_indexers, sort])

Compute join_index and indexers to conform data structures to the new index.

log()

Get the natural logarithm of all elements, element-wise.

mask(cond[, other, inplace])

Replace values where the condition is True.

max()

Return the maximum value of the Index.

memory_usage([deep])

Memory usage of the values.

min()

Return the minimum value of the Index.

notna()

Identify non-missing values.

notnull()

Identify non-missing values.

pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

rank([axis, method, numeric_only, …])

Compute numerical data ranks (1 through n) along axis.

rename(name[, inplace])

Alter Index name.

repeat(repeats[, axis])

Repeats elements consecutively.

round([decimals])

Round a DataFrame to a variable number of decimal places.

sample([n, frac, replace, weights, …])

Return a random sample of items from an axis of object.

scatter_by_map(map_index[, map_size, keep_index])

Scatter to a list of dataframes.

searchsorted(values[, side, ascending, …])

Find indices where elements should be inserted to maintain order

set_names(names[, level, inplace])

Set Index or MultiIndex name.

shift([periods, freq, axis, fill_value])

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

sort_values([return_indexer, ascending, key])

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

sqrt()

Get the non-negative square-root of all elements, element-wise.

sum()

Return the sum of all values of the Index.

take(indices)

Gather only the specific subset of indices

tan()

Get Trigonometric tangent, element-wise.

tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

to_array([fillna])

Get a dense numpy array for the data.

to_arrow()

Convert Index to PyArrow Array

to_dlpack()

Converts a cuDF object into a DLPack tensor.

to_frame([index, name])

Create a DataFrame with a column containing this Index

to_pandas()

Convert to a Pandas Index.

to_series([index, name])

Create a Series with both index and values equal to the index keys.

unique()

Return unique values in the index.

where(cond[, other])

Replace values where the condition is False.

replace

acos()

Get Trigonometric inverse cosine, element-wise.

The inverse of cos so that, if y = x.cos(), then x = y.acos()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.acos()
0    3.141593
1    1.570796
2    0.000000
3    1.240482
4    1.047198
dtype: float64

acos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.acos()
      first    second
0  3.141593  1.334606
1  1.570796  1.266104
2  1.047198  1.470629

acos operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.acos()
Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0,
            1.5707963267948966,  1.266103672779499],
            dtype='float64')
any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

Parameters
otherIndex or list/tuple of indices
Returns
appendedIndex

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 10, 100])
>>> idx
Int64Index([1, 2, 10, 100], dtype='int64')
>>> other = cudf.Index([200, 400, 50])
>>> other
Int64Index([200, 400, 50], dtype='int64')
>>> idx.append(other)
Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')

append accepts list of Index objects

>>> idx.append([other, other])
Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
argsort(ascending=True, **kwargs)

Return the integer indices that would sort the index.

Parameters
ascendingbool, default True

If True, returns the indices for ascending order. If False, returns the indices for descending order.

Returns
arrayA cupy array containing Integer indices that

would sort the index if used as an indexer.

Examples

>>> import cudf
>>> index = cudf.Index([10, 100, 1, 1000])
>>> index
Int64Index([10, 100, 1, 1000], dtype='int64')
>>> index.argsort()
array([2, 0, 1, 3], dtype=int32)

The order of argsort can be reversed using ascending parameter, by setting it to False. >>> index.argsort(ascending=False) array([3, 1, 0, 2], dtype=int32)

argsort on a MultiIndex:

>>> index = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> index
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> index.argsort()
array([4, 0, 1, 2, 3], dtype=int32)
>>> index.argsort(ascending=False)
array([3, 2, 1, 0, 4], dtype=int32)
asin()

Get Trigonometric inverse sine, element-wise.

The inverse of sine so that, if y = x.sin(), then x = y.asin()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.asin()
0   -1.570796
1    0.000000
2    1.570796
3    0.330314
4    0.523599
dtype: float64

asin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.asin()
      first    second
0 -1.570796  0.236190
1  0.000000  0.304693
2  0.523599  0.100167

asin operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64')
>>> index.asin()
Float64Index([-1.5707963267948966, 0.41151684606748806,
            1.5707963267948966, 0.3046926540153975],
            dtype='float64')
astype(dtype, copy=False)

Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.

Parameters
dtypenumpy dtype

Use a numpy.dtype to cast entire Index object to.

copybool, default False

By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.

Returns
Index

Index with values cast to specified dtype.

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3])
>>> index
Int64Index([1, 2, 3], dtype='int64')
>>> index.astype('float64')
Float64Index([1.0, 2.0, 3.0], dtype='float64')
atan()

Get Trigonometric inverse tangent, element-wise.

The inverse of tan so that, if y = x.tan(), then x = y.atan()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10])
>>> ser
0    -1.00000
1     0.00000
2     1.00000
3     0.32434
4     0.50000
5   -10.00000
dtype: float64
>>> ser.atan()
0   -0.785398
1    0.000000
2    0.785398
3    0.313635
4    0.463648
5   -1.471128
dtype: float64

atan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.atan()
      first    second
0 -0.785398  0.229864
1 -1.471128  0.291457
2  0.463648  1.471128

atan operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.atan()
Float64Index([-0.7853981633974483,  0.3805063771123649,
                            0.7853981633974483, 0.0,
                            0.2914567944778671],
            dtype='float64')
clip(lower=None, upper=None, inplace=False, axis=1)

Trim values at input threshold(s).

Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.

Parameters
lowerscalar or array_like, default None

Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.

upperscalar or array_like, default None

Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.

inplacebool, default False
Returns
Clipped DataFrame/Series/Index/MultiIndex

Examples

>>> import cudf
>>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']})
>>> df.clip(lower=[2, 'b'], upper=[3, 'c'])
   a  b
0  2  b
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=None, upper=[3, 'c'])
   a  b
0  1  a
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=[2, 'b'], upper=None)
   a  b
0  2  b
1  2  b
2  3  c
3  4  d
>>> df.clip(lower=2, upper=3, inplace=True)
>>> df
   a  b
0  2  2
1  2  3
2  3  3
3  3  3
>>> import cudf
>>> sr = cudf.Series([1, 2, 3, 4])
>>> sr.clip(lower=2, upper=3)
0    2
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=None, upper=3)
0    1
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True)
>>> sr
0    2
1    2
2    3
3    4
dtype: int64
copy(name=None, deep=False, dtype=None, names=None)

Make a copy of this object.

Parameters
nameobject, default None

Name of index, use original name when None

deepbool, default True

Make a deep copy of the data. With deep=False the original data is used

dtypenumpy dtype, default None

Target datatype to cast into, use original dtype when None

nameslist-like, default False

Kept compatibility with MultiIndex. Should not be used.

Returns
New index instance, casted to new dtype
cos()

Get Trigonometric cosine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.cos()
0    1.000000
1    0.947861
2    0.877583
3    0.525322
4   -0.448074
5   -0.598460
6   -0.283691
dtype: float64

cos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.cos()
      first    second
0  1.000000  0.862319
1  0.283662 -0.283691
2 -0.839072 -0.839039
3 -0.759688 -0.022097

cos operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.cos()
Float64Index([ 0.9210609940028851,  0.8623188722876839,
            -0.5984600690578581, -0.4480736161291701],
            dtype='float64')
difference(other, sort=None)

Return a new Index with elements from the index that are not in other.

This is the set difference of two Index objects.

Parameters
otherIndex or array-like
sortFalse or None, default None

Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.

  • None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.

  • False : Do not sort the result.

Returns
differenceIndex

Examples

>>> import cudf
>>> idx1 = cudf.Index([2, 1, 3, 4])
>>> idx1
Int64Index([2, 1, 3, 4], dtype='int64')
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx2
Int64Index([3, 4, 5, 6], dtype='int64')
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
drop_duplicates(keep='first')

Return Index with duplicate values removed

Parameters
keep{‘first’, ‘last’, False}, default ‘first’
  • ‘first’Drop duplicates except for the

    first occurrence.

  • ‘last’Drop duplicates except for the

    last occurrence.

  • False : Drop all duplicates.

Returns
deduplicatedIndex

Examples

>>> import cudf
>>> idx = cudf.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
>>> idx
StringIndex(['lama' 'cow' 'lama' 'beetle' 'lama' 'hippo'], dtype='object')
>>> idx.drop_duplicates()
StringIndex(['beetle' 'cow' 'hippo' 'lama'], dtype='object')
dropna(how='any')

Return an Index with null values removed.

Parameters
how{‘any’, ‘all’}, default ‘any’

If the Index is a MultiIndex, drop the value when any or all levels are NaN.

Returns
validIndex

Examples

>>> import cudf
>>> index = cudf.Index(['a', None, 'b', 'c'])
>>> index
StringIndex(['a' None 'b' 'c'], dtype='object')
>>> index.dropna()
StringIndex(['a' 'b' 'c'], dtype='object')

Using dropna on a MultiIndex:

>>> midx = cudf.MultiIndex(
...         levels=[[1, None, 4, None], [1, 2, 5]],
...         codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...         names=["x", "y"],
...     )
>>> midx
MultiIndex([(   1, 1),
            (   1, 5),
            (<NA>, 2),
            (   4, 2),
            (<NA>, 1)],
           names=['x', 'y'])
>>> midx.dropna()
MultiIndex([(1, 1),
            (1, 5),
            (4, 2)],
           names=['x', 'y'])
property dtype

dtype of the underlying values in GenericIndex.

property empty

Indicator whether Index is empty.

True if Index is entirely empty (no items).

Returns
outbool

If Index is empty, return True, if not return False.

Examples

>>> import cudf
>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.empty
True
equals(other, **kwargs)

Determine if two Index objects contain the same elements.

Returns
out: bool

True if “other” is an Index and it has the same elements as calling index; False otherwise.

exp()

Get the exponential of all elements, element-wise.

Exponential is the inverse of the log function, so that x.exp().log() = x

Returns
DataFrame/Series/Index

Result of the element-wise exponential.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.exp()
0    3.678794e-01
1    1.000000e+00
2    2.718282e+00
3    1.383117e+00
4    1.648721e+00
5    4.539993e-05
6    2.688117e+43
dtype: float64

exp operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.exp()
      first        second
0  0.367879      1.263644
1  0.000045      1.349859
2  1.648721  22026.465795

exp operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.exp()
Float64Index([0.36787944117144233,  1.4918246976412703,
              2.718281828459045, 1.0,  1.3498588075760032],
            dtype='float64')
factorize(na_sentinel=- 1)

Encode the input values as integer labels

See also

cudf.core.series.Series.factorize

Encode the input values of Series.

fillna(value, downcast=None)

Fill null values with the specified value.

Parameters
valuescalar

Scalar value to use to fill nulls. This value cannot be a list-likes.

downcastdict, default is None

This Parameter is currently NON-FUNCTIONAL.

Returns
filledIndex

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, None, 4])
>>> index
Int64Index([1, 2, null, 4], dtype='int64')
>>> index.fillna(3)
Int64Index([1, 2, 3, 4], dtype='int64')
find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

Returns
begin, end2-tuple of int

The starting index and the ending index. The last value occurs at end - 1 position.

classmethod from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

Parameters
arrayPyArrow Array/ChunkedArray

PyArrow Object which has to be converted to Index

Returns
cudf Index
Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pyarrow as pa
>>> cudf.Index.from_arrow(pa.array(["a", "b", None]))
StringIndex(['a' 'b' None], dtype='object')
classmethod from_pandas(index, nan_as_null=None)

Convert from a Pandas Index.

Parameters
indexPandas Index object

A Pandas Index object which has to be converted to cuDF Index.

nan_as_nullbool, Default None

If None/True, converts np.nan values to null values. If False, leaves np.nan values as is.

Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pandas as pd
>>> import numpy as np
>>> data = [10, 20, 30, np.nan]
>>> pdi = pd.Index(data)
>>> cudf.core.index.Index.from_pandas(pdi)
Float64Index([10.0, 20.0, 30.0, <NA>], dtype='float64')
>>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False)
Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
get_level_values(level)

Return an Index of values for requested level.

This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.

Parameters
levelint or str

It is either the integer position or the name of the level.

Returns
Index

Calling object, as there is only one level in the Index.

See also

cudf.core.multiindex.MultiIndex.get_level_values

Get values for a level of a MultiIndex.

Notes

For Index, level should be 0, since there are no multiple levels.

Examples

>>> import cudf
>>> idx = cudf.Index(["a", "b", "c"])
>>> idx.get_level_values(0)
StringIndex(['a' 'b' 'c'], dtype='object')
get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if side=='right') position of given label.

Parameters
labelobject
side{‘left’, ‘right’}
kind{‘ix’, ‘loc’, ‘getitem’}
Returns
int

Index of label.

property gpu_values

View the data as a numba device array object

interleave_columns()

Interleave Series columns of a table into a single column.

Converts the column major table cols into a row major column.

Parameters
colsinput Table containing columns to interleave.
Returns
The interleaved columns as a single column

Examples

>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']])
>>> df
0    [A1, A2, A3]
1    [B1, B2, B3]
>>> df.interleave_columns()
0    A1
1    B1
2    A2
3    B2
4    A3
5    B3
property is_monotonic

Alias for is_monotonic_increasing.

property is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

property is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

property is_unique

Return if the index has unique values.

isin(values)

Return a boolean array where the index values are in values.

Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.

Parameters
valuesset, list-like, Index

Sought values.

Returns
is_containedcupy array

CuPy array of boolean values.

Examples

>>> idx = cudf.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')

Check whether each index value in a list of values.

>>> idx.isin([1, 4])
array([ True, False, False])
isna()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
isnull()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
join(other, how='left', level=None, return_indexers=False, sort=False)

Compute join_index and indexers to conform data structures to the new index.

Parameters
otherIndex.
how{‘left’, ‘right’, ‘inner’, ‘outer’}
return_indexersbool, default False
sortbool, default False

Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).

Returns: index

Examples

>>> import cudf
>>> lhs = cudf.DataFrame(
...     {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b']
... ).index
>>> lhs
MultiIndex([(2, 3),
            (3, 4),
            (1, 2)],
           names=['a', 'b'])
>>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index
>>> rhs
Int64Index([1, 4, 3], dtype='int64', name='a')
>>> lhs.join(rhs, how='inner')
MultiIndex([(3, 4),
            (1, 2)],
           names=['a', 'b'])
log()

Get the natural logarithm of all elements, element-wise.

Natural logarithm is the inverse of the exp function, so that x.log().exp() = x

Returns
DataFrame/Series/Index

Result of the element-wise natural logarithm.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.log()
0         NaN
1        -inf
2    0.000000
3   -1.125963
4   -0.693147
5         NaN
6    4.605170
dtype: float64

log operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.log()
      first    second
0       NaN -1.452434
1       NaN -1.203973
2 -0.693147  2.302585

log operation on Index:

>>> index = cudf.Index([10, 11, 500.0])
>>> index
Float64Index([10.0, 11.0, 500.0], dtype='float64')
>>> index.log()
Float64Index([2.302585092994046, 2.3978952727983707,
            6.214608098422191], dtype='float64')
mask(cond, other=None, inplace=False)

Replace values where the condition is True.

Parameters
condbool Series/DataFrame, array-like

Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.

other: scalar, list of scalars, Series/DataFrame

Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.

DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.

Series expects only scalar or series like with same length

inplacebool, default False

Whether to perform the operation in place on the data.

Returns
Same type as caller

Examples

>>> import cudf
>>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]})
>>> df.mask(df % 2 == 0, [-1, -1])
   A  B
0  1  3
1 -1  5
2  5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.mask(ser > 2, 10)
0    10
1    10
2     2
3     1
4     0
dtype: int64
>>> ser.mask(ser > 2)
0    <NA>
1    <NA>
2       2
3       1
4       0
dtype: int64
max()

Return the maximum value of the Index.

Returns
scalar

Maximum value.

See also

cudf.core.index.Index.min

Return the minimum value in an Index.

cudf.core.series.Series.max

Return the maximum value in a Series.

cudf.core.dataframe.DataFrame.max

Return the maximum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.max()
3
memory_usage(deep=False)

Memory usage of the values.

Parameters
deepbool

Introspect the data deeply, interrogate object dtypes for system-level memory consumption.

Returns
bytes used
min()

Return the minimum value of the Index.

Returns
scalar

Minimum value.

See also

cudf.core.index.Index.max

Return the maximum value in an Index.

cudf.core.series.Series.min

Return the minimum value in a Series.

cudf.core.dataframe.DataFrame.min

Return the minimum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.min()
1
property name

Returns the name of the Index.

property names

Returns a tuple containing the name of the Index.

property ndim

Dimension of the data. Apart from MultiIndex ndim is always 1.

property nlevels

Number of levels.

notna()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
notnull()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

Parameters
funcfunction

Function to apply to the Series/DataFrame/Index. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame/Index.

argsiterable, optional

Positional arguments passed into func.

kwargsmapping, optional

A dictionary of keyword arguments passed into func.

Returns
objectthe return type of func.

Examples

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. Instead of writing

>>> func(g(h(df), arg1=a), arg2=b, arg3=c)

You can write

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe(func, arg2=b, arg3=c)
... )

If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose f takes its data as arg2:

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe((func, 'arg2'), arg1=a, arg3=c)
...  )
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)

Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.

Parameters
axis{0 or ‘index’, 1 or ‘columns’}, default 0

Index to direct ranking.

method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’

How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.

numeric_onlybool, optional

For DataFrame objects, rank only numeric columns if set to True.

na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’

How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.

ascendingbool, default True

Whether or not the elements should be ranked in ascending order.

pctbool, default False

Whether or not to display the returned rankings in percentile form.

Returns
same type as caller

Return a Series or DataFrame with data ranks as values.

rename(name, inplace=False)

Alter Index name.

Defaults to returning new index.

Parameters
namelabel

Name(s) to set.

Returns
Index

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3], name='one')
>>> index
Int64Index([1, 2, 3], dtype='int64', name='one')
>>> index.name
'one'
>>> renamed_index = index.rename('two')
>>> renamed_index
Int64Index([1, 2, 3], dtype='int64', name='two')
>>> renamed_index.name
'two'
repeat(repeats, axis=None)

Repeats elements consecutively.

Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.

Parameters
repeatsint, or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.

Returns
Series/DataFrame/Index

A newly created object of same type as caller with repeated elements.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]})
>>> df
   a   b
0  1  10
1  2  20
2  3  30
>>> df.repeat(3)
   a   b
0  1  10
0  1  10
0  1  10
1  2  20
1  2  20
1  2  20
2  3  30
2  3  30
2  3  30

Repeat on Series

>>> s = cudf.Series([0, 2])
>>> s
0    0
1    2
dtype: int64
>>> s.repeat([3, 4])
0    0
0    0
0    0
1    2
1    2
1    2
1    2
dtype: int64
>>> s.repeat(2)
0    0
0    0
1    2
1    2
dtype: int64

Repeat on Index

>>> index = cudf.Index([10, 22, 33, 55])
>>> index
Int64Index([10, 22, 33, 55], dtype='int64')
>>> index.repeat(5)
Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33,
            33, 33, 33, 33, 55, 55, 55, 55, 55],
        dtype='int64')
round(decimals=0)

Round a DataFrame to a variable number of decimal places.

Parameters
decimalsint, dict, Series

Number of decimal places to round each column to. If an int is given, round each column to the same number of places. Otherwise dict and Series round to variable numbers of places. Column names should be in the keys if decimals is a dict-like, or in the index if decimals is a Series. Any columns not included in decimals will be left as is. Elements of decimals which are not columns of the input will be ignored.

Returns
DataFrame

A DataFrame with the affected columns rounded to the specified number of decimal places.

Examples

>>> df = cudf.DataFrame(
        [(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
...     columns=['dogs', 'cats']
... )
>>> df
    dogs  cats
0  0.21  0.32
1  0.01  0.67
2  0.66  0.03
3  0.21  0.18

By providing an integer each column is rounded to the same number of decimal places

>>> df.round(1)
    dogs  cats
0   0.2   0.3
1   0.0   0.7
2   0.7   0.0
3   0.2   0.2

With a dict, the number of places for specific columns can be specified with the column names as key and the number of decimal places as value

>>> df.round({'dogs': 1, 'cats': 0})
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

Using a Series, the number of places for specific columns can be specified with the column names as index and the number of decimal places as value

>>> decimals = cudf.Series([0, 1], index=['cats', 'dogs'])
>>> df.round(decimals)
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters
nint, optional

Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

fracfloat, optional

Fraction of axis items to return. Cannot be used with n.

replacebool, default False

Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”

weightsstr or ndarray-like, optional

Only supported for axis=1/”columns”

random_stateint, numpy RandomState or None, default None

Seed for the random number generator (if int), or None. If None, a random seed will be chosen. if RandomState, seed will be extracted from current state.

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.

Returns
Series or DataFrame or Index

A new object of same type as caller containing n items randomly sampled from the caller object.

Examples

>>> import cudf as cudf
>>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}})
>>> df.sample(3)
   a
1  2
3  4
0  1
>>> sr = cudf.Series([1, 2, 3, 4, 5])
>>> sr.sample(10, replace=True)
1    4
3    1
2    4
0    5
0    1
4    5
4    1
0    2
0    3
3    2
dtype: int64
>>> df = cudf.DataFrame(
... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]})
>>> df.sample(2, axis=1)
   a  c
0  1  3
1  2  4
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)

Scatter to a list of dataframes.

Uses map_index to determine the destination of each row of the original DataFrame.

Parameters
map_indexSeries, str or list-like

Scatter assignment for each row

map_sizeint

Length of output list. Must be >= uniques in map_index

keep_indexbool

Conserve original index values for each row

Returns
A list of cudf.DataFrame objects.
searchsorted(values, side='left', ascending=True, na_position='last')

Find indices where elements should be inserted to maintain order

Parameters
valueFrame (Shape must be consistent with self)

Values to be hypothetically inserted into Self

sidestr {‘left’, ‘right’} optional, default ‘left‘

If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index

ascendingbool optional, default True

Sorted Frame is in ascending order (otherwise descending)

na_positionstr {‘last’, ‘first’} optional, default ‘last‘

Position of null values in sorted order

Returns
1-D cupy array of insertion points

Examples

>>> s = cudf.Series([1, 2, 3])
>>> s.searchsorted(4)
3
>>> s.searchsorted([0, 4])
array([0, 3], dtype=int32)
>>> s.searchsorted([1, 3], side='left')
array([0, 2], dtype=int32)
>>> s.searchsorted([1, 3], side='right')
array([1, 3], dtype=int32)

If the values are not monotonically sorted, wrong locations may be returned:

>>> s = cudf.Series([2, 1, 3])
>>> s.searchsorted(1)
0   # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]})
>>> df
   a   b
0  1  10
1  3  12
2  5  14
3  7  16
>>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6],
... 'b': [10, 11, 13, 15]})
>>> values_df
   a   b
0  0  10
1  2  17
2  5  13
3  6  15
>>> df.searchsorted(values_df, ascending=False)
array([4, 4, 4, 0], dtype=int32)
set_names(names, level=None, inplace=False)

Set Index or MultiIndex name. Able to set new names partially and by level.

Parameters
nameslabel or list of label

Name(s) to set.

levelint, label or list of int or label, optional

If the index is a MultiIndex, level(s) to set (None for all levels). Otherwise level must be None.

inplacebool, default False

Modifies the object directly, instead of creating a new Index or MultiIndex.

Returns
Index

The same type as the caller or None if inplace is True.

See also

cudf.core.index.Index.rename

Able to set new names without level.

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = cudf.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
            ('python', 2019),
            ( 'cobra', 2018),
            ( 'cobra', 2019)],
           )
>>> idx.names
FrozenList([None, None])
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx.names
FrozenList(['kind', 'year'])
>>> idx.set_names('species', level=0, inplace=True)
>>> idx.names
FrozenList(['species', 'year'])
property shape

Returns a tuple representing the dimensionality of the Index.

shift(periods=1, freq=None, axis=0, fill_value=None)

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.sin()
0    0.000000
1    0.318683
2    0.479426
3    0.850904
4    0.893997
5   -0.801153
6    0.958916
dtype: float64

sin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.sin()
      first    second
0  0.000000 -0.506366
1 -0.958924  0.958916
2 -0.544021 -0.544072
3  0.650288 -0.999756

sin operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.sin()
Float64Index([-0.3894183423086505, -0.5063656411097588,
            0.8011526357338306, 0.8939966636005579],
            dtype='float64')
property size

Return the number of elements in the underlying data.

Returns
sizeSize of the DataFrame / Index / Series / MultiIndex

Examples

Size of an empty dataframe is 0.

>>> import cudf
>>> df = cudf.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>> df.size
0
>>> df = cudf.DataFrame(index=[1, 2, 3])
>>> df
Empty DataFrame
Columns: []
Index: [1, 2, 3]
>>> df.size
0

DataFrame with values

>>> df = cudf.DataFrame({'a': [10, 11, 12],
...         'b': ['hello', 'rapids', 'ai']})
>>> df
    a       b
0  10   hello
1  11  rapids
2  12      ai
>>> df.size
6
>>> df.index
RangeIndex(start=0, stop=3)
>>> df.index.size
3

Size of an Index

>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.size
0
>>> index = cudf.Index([1, 2, 3, 10])
>>> index
Int64Index([1, 2, 3, 10], dtype='int64')
>>> index.size
4

Size of a MultiIndex

>>> midx = cudf.MultiIndex(
...                 levels=[["a", "b", "c", None], ["1", None, "5"]],
...                 codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...                 names=["x", "y"],
...             )
>>> midx
MultiIndex([( 'a',  '1'),
            ( 'a',  '5'),
            ( 'b', <NA>),
            ( 'c', <NA>),
            (<NA>,  '1')],
           names=['x', 'y'])
>>> midx.size
5
sort_values(return_indexer=False, ascending=True, key=None)

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

Parameters
return_indexerbool, default False

Should the indices that would sort the index be returned.

ascendingbool, default True

Should the index values be sorted in an ascending order.

keyNone, optional

This parameter is NON-FUNCTIONAL.

Returns
sorted_indexIndex

Sorted copy of the index.

indexercupy.ndarray, optional

The indices that the index itself was sorted by.

See also

cudf.core.series.Series.min

Sort values of a Series.

cudf.core.dataframe.DataFrame.sort_values

Sort values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')

Sort values in ascending order (default behavior).

>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')

Sort values in descending order, and also get the indices idx was sorted by.

>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2],
                                                    dtype=int32))

Sorting values in a MultiIndex:

>>> midx = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> midx
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> midx.sort_values()
MultiIndex([(-10,  1),
            (  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11)],
           names=['x', 'y'])
>>> midx.sort_values(ascending=False)
MultiIndex([(  4, 11),
            (  3, 11),
            (  1,  5),
            (  1,  1),
            (-10,  1)],
           names=['x', 'y'])
sqrt()

Get the non-negative square-root of all elements, element-wise.

Returns
DataFrame/Series/Index

Result of the non-negative square-root of each element.

Examples

>>> import cudf
>>> import cudf
>>> ser = cudf.Series([10, 25, 81, 1.0, 100])
>>> ser
0     10.0
1     25.0
2     81.0
3      1.0
4    100.0
dtype: float64
>>> ser.sqrt()
0     3.162278
1     5.000000
2     9.000000
3     1.000000
4    10.000000
dtype: float64

sqrt operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-10.0, 100, 625],
...                      'second': [1, 2, 0.4]})
>>> df
   first  second
0  -10.0     1.0
1  100.0     2.0
2  625.0     0.4
>>> df.sqrt()
   first    second
0    NaN  1.000000
1   10.0  1.414214
2   25.0  0.632456

sqrt operation on Index:

>>> index = cudf.Index([-10.0, 100, 625])
>>> index
Float64Index([-10.0, 100.0, 625.0], dtype='float64')
>>> index.sqrt()
Float64Index([nan, 10.0, 25.0], dtype='float64')
sum()

Return the sum of all values of the Index.

Returns
scalar

Sum of all values.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.sum()
6
take(indices)

Gather only the specific subset of indices

Parameters
indices: An array-like that maps to values contained in this Index.
tan()

Get Trigonometric tangent, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.tan()
0    0.000000
1    0.336213
2    0.546302
3    1.619775
4   -1.995200
5    1.338690
6   -3.380140
dtype: float64

tan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.tan()
      first     second
0  0.000000  -0.587214
1 -3.380515  -3.380140
2  0.648361   0.648446
3 -0.855993  45.244742

tan operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.tan()
Float64Index([-0.4227932187381618,  -0.587213915156929,
            -1.3386902103511544, -1.995200412208242],
            dtype='float64')
tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

Parameters
selfinput Table containing columns to interleave.
countNumber of times to tile “rows”. Must be non-negative.
Returns
The table containing the tiled “rows”.

Examples

>>> df  = Dataframe([[8, 4, 7], [5, 2, 3]])
>>> count = 2
>>> df.tile(df, count)
   0  1  2
0  8  4  7
1  5  2  3
0  8  4  7
1  5  2  3
to_array(fillna=None)

Get a dense numpy array for the data.

Parameters
fillnastr or None

Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.

Notes

if fillna is None, null values are skipped. Therefore, the output size could be smaller.

to_arrow()

Convert Index to PyArrow Array

Returns
PyArrow Array

Examples

>>> import cudf
>>> ind = cudf.Index(["a", "b", None])
>>> ind.to_arrow()
<pyarrow.lib.StringArray object at 0x7f796b0e7750>
[
  "a",
  "b",
  null
]
to_dlpack()

Converts a cuDF object into a DLPack tensor.

DLPack is an open-source memory tensor structure: dmlc/dlpack.

This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.

Parameters
cudf_objDataFrame, Series, Index, or Column
Returns
pycapsule_objPyCapsule

Output DLPack tensor pointer which is encapsulated in a PyCapsule object.

to_frame(index=True, name=None)

Create a DataFrame with a column containing this Index

Parameters
indexboolean, default True

Set the index of the returned DataFrame as the original Index

namestr, default None

Name to be used for the column

Returns
DataFrame

cudf DataFrame

to_pandas()

Convert to a Pandas Index.

Examples

>>> import cudf
>>> idx = cudf.Index([-3, 10, 15, 20])
>>> idx
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> idx.to_pandas()
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> type(idx.to_pandas())
<class 'pandas.core.indexes.numeric.Int64Index'>
>>> type(idx)
<class 'cudf.core.index.GenericIndex'>
to_series(index=None, name=None)

Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.

Parameters
indexIndex, optional

Index of resulting Series. If None, defaults to original index.

namestr, optional

Dame of resulting Series. If None, defaults to name of original index.

Returns
Series

The dtype will be based on the type of the Index values.

unique()

Return unique values in the index.

Returns
Index without duplicates
property values

Return an array representing the data in the Index.

Returns
arrayA cupy array of data in the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values
array([  1, -10, 100,  20])
>>> type(index.values)
<class 'cupy.core.core.ndarray'>
property values_host

Return a numpy representation of the Index.

Only the values in the Index will be returned.

Returns
outnumpy.ndarray

The values of the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values_host
array([  1, -10, 100,  20])
>>> type(index.values_host)
<class 'numpy.ndarray'>
where(cond, other=None)

Replace values where the condition is False.

Parameters
condbool array-like with the same length as self

Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.

other: scalar, or array-like

Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.

Returns
Same type as caller

Examples

>>> import cudf
>>> index = cudf.Index([4, 3, 2, 1, 0])
>>> index
Int64Index([4, 3, 2, 1, 0], dtype='int64')
>>> index.where(index > 2, 15)
Int64Index([4, 3, 15, 15, 15], dtype='int64')

Float64Index

class cudf.core.index.Float64Index(data=None, dtype=None, copy=False, name=None)

Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.

Parameters
dataarray-like (1-dimensional)/ DataFrame

If it is a DataFrame, it will return a MultiIndex

dtypeNumPy dtype (default: object)

If dtype is None, we find the dtype that best fits the data.

copybool

Make a copy of input data.

nameobject

Name to be stored in the index.

tupleize_colsbool (default: True)

When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.

Returns
Index

cudf Index

Examples

>>> import cudf
>>> cudf.Index([1, 2, 3], dtype="uint64", name="a")
UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]}))
MultiIndex([(1, 2),
            (2, 3)],
          names=['a', 'b'])
Attributes
dtype

dtype of the underlying values in GenericIndex.

empty

Indicator whether Index is empty.

gpu_values

View the data as a numba device array object

is_monotonic

Alias for is_monotonic_increasing.

is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

is_unique

Return if the index has unique values.

name

Returns the name of the Index.

names

Returns a tuple containing the name of the Index.

ndim

Dimension of the data.

nlevels

Number of levels.

shape

Returns a tuple representing the dimensionality of the Index.

size

Return the number of elements in the underlying data.

values

Return an array representing the data in the Index.

values_host

Return a numpy representation of the Index.

Methods

acos()

Get Trigonometric inverse cosine, element-wise.

any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

argsort([ascending])

Return the integer indices that would sort the index.

asin()

Get Trigonometric inverse sine, element-wise.

astype(dtype[, copy])

Create an Index with values cast to dtypes.

atan()

Get Trigonometric inverse tangent, element-wise.

clip([lower, upper, inplace, axis])

Trim values at input threshold(s).

copy([name, deep, dtype, names])

Make a copy of this object.

cos()

Get Trigonometric cosine, element-wise.

difference(other[, sort])

Return a new Index with elements from the index that are not in other.

drop_duplicates([keep])

Return Index with duplicate values removed

dropna([how])

Return an Index with null values removed.

equals(other, **kwargs)

Determine if two Index objects contain the same elements.

exp()

Get the exponential of all elements, element-wise.

factorize([na_sentinel])

Encode the input values as integer labels

fillna(value[, downcast])

Fill null values with the specified value.

find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

from_pandas(index[, nan_as_null])

Convert from a Pandas Index.

get_level_values(level)

Return an Index of values for requested level.

get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label.

interleave_columns()

Interleave Series columns of a table into a single column.

isin(values)

Return a boolean array where the index values are in values.

isna()

Identify missing values.

isnull()

Identify missing values.

join(other[, how, level, return_indexers, sort])

Compute join_index and indexers to conform data structures to the new index.

log()

Get the natural logarithm of all elements, element-wise.

mask(cond[, other, inplace])

Replace values where the condition is True.

max()

Return the maximum value of the Index.

memory_usage([deep])

Memory usage of the values.

min()

Return the minimum value of the Index.

notna()

Identify non-missing values.

notnull()

Identify non-missing values.

pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

rank([axis, method, numeric_only, …])

Compute numerical data ranks (1 through n) along axis.

rename(name[, inplace])

Alter Index name.

repeat(repeats[, axis])

Repeats elements consecutively.

round([decimals])

Round a DataFrame to a variable number of decimal places.

sample([n, frac, replace, weights, …])

Return a random sample of items from an axis of object.

scatter_by_map(map_index[, map_size, keep_index])

Scatter to a list of dataframes.

searchsorted(values[, side, ascending, …])

Find indices where elements should be inserted to maintain order

set_names(names[, level, inplace])

Set Index or MultiIndex name.

shift([periods, freq, axis, fill_value])

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

sort_values([return_indexer, ascending, key])

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

sqrt()

Get the non-negative square-root of all elements, element-wise.

sum()

Return the sum of all values of the Index.

take(indices)

Gather only the specific subset of indices

tan()

Get Trigonometric tangent, element-wise.

tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

to_array([fillna])

Get a dense numpy array for the data.

to_arrow()

Convert Index to PyArrow Array

to_dlpack()

Converts a cuDF object into a DLPack tensor.

to_frame([index, name])

Create a DataFrame with a column containing this Index

to_pandas()

Convert to a Pandas Index.

to_series([index, name])

Create a Series with both index and values equal to the index keys.

unique()

Return unique values in the index.

where(cond[, other])

Replace values where the condition is False.

replace

acos()

Get Trigonometric inverse cosine, element-wise.

The inverse of cos so that, if y = x.cos(), then x = y.acos()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.acos()
0    3.141593
1    1.570796
2    0.000000
3    1.240482
4    1.047198
dtype: float64

acos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.acos()
      first    second
0  3.141593  1.334606
1  1.570796  1.266104
2  1.047198  1.470629

acos operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.acos()
Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0,
            1.5707963267948966,  1.266103672779499],
            dtype='float64')
any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

Parameters
otherIndex or list/tuple of indices
Returns
appendedIndex

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 10, 100])
>>> idx
Int64Index([1, 2, 10, 100], dtype='int64')
>>> other = cudf.Index([200, 400, 50])
>>> other
Int64Index([200, 400, 50], dtype='int64')
>>> idx.append(other)
Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')

append accepts list of Index objects

>>> idx.append([other, other])
Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
argsort(ascending=True, **kwargs)

Return the integer indices that would sort the index.

Parameters
ascendingbool, default True

If True, returns the indices for ascending order. If False, returns the indices for descending order.

Returns
arrayA cupy array containing Integer indices that

would sort the index if used as an indexer.

Examples

>>> import cudf
>>> index = cudf.Index([10, 100, 1, 1000])
>>> index
Int64Index([10, 100, 1, 1000], dtype='int64')
>>> index.argsort()
array([2, 0, 1, 3], dtype=int32)

The order of argsort can be reversed using ascending parameter, by setting it to False. >>> index.argsort(ascending=False) array([3, 1, 0, 2], dtype=int32)

argsort on a MultiIndex:

>>> index = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> index
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> index.argsort()
array([4, 0, 1, 2, 3], dtype=int32)
>>> index.argsort(ascending=False)
array([3, 2, 1, 0, 4], dtype=int32)
asin()

Get Trigonometric inverse sine, element-wise.

The inverse of sine so that, if y = x.sin(), then x = y.asin()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.asin()
0   -1.570796
1    0.000000
2    1.570796
3    0.330314
4    0.523599
dtype: float64

asin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.asin()
      first    second
0 -1.570796  0.236190
1  0.000000  0.304693
2  0.523599  0.100167

asin operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64')
>>> index.asin()
Float64Index([-1.5707963267948966, 0.41151684606748806,
            1.5707963267948966, 0.3046926540153975],
            dtype='float64')
astype(dtype, copy=False)

Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.

Parameters
dtypenumpy dtype

Use a numpy.dtype to cast entire Index object to.

copybool, default False

By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.

Returns
Index

Index with values cast to specified dtype.

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3])
>>> index
Int64Index([1, 2, 3], dtype='int64')
>>> index.astype('float64')
Float64Index([1.0, 2.0, 3.0], dtype='float64')
atan()

Get Trigonometric inverse tangent, element-wise.

The inverse of tan so that, if y = x.tan(), then x = y.atan()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10])
>>> ser
0    -1.00000
1     0.00000
2     1.00000
3     0.32434
4     0.50000
5   -10.00000
dtype: float64
>>> ser.atan()
0   -0.785398
1    0.000000
2    0.785398
3    0.313635
4    0.463648
5   -1.471128
dtype: float64

atan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.atan()
      first    second
0 -0.785398  0.229864
1 -1.471128  0.291457
2  0.463648  1.471128

atan operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.atan()
Float64Index([-0.7853981633974483,  0.3805063771123649,
                            0.7853981633974483, 0.0,
                            0.2914567944778671],
            dtype='float64')
clip(lower=None, upper=None, inplace=False, axis=1)

Trim values at input threshold(s).

Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.

Parameters
lowerscalar or array_like, default None

Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.

upperscalar or array_like, default None

Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.

inplacebool, default False
Returns
Clipped DataFrame/Series/Index/MultiIndex

Examples

>>> import cudf
>>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']})
>>> df.clip(lower=[2, 'b'], upper=[3, 'c'])
   a  b
0  2  b
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=None, upper=[3, 'c'])
   a  b
0  1  a
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=[2, 'b'], upper=None)
   a  b
0  2  b
1  2  b
2  3  c
3  4  d
>>> df.clip(lower=2, upper=3, inplace=True)
>>> df
   a  b
0  2  2
1  2  3
2  3  3
3  3  3
>>> import cudf
>>> sr = cudf.Series([1, 2, 3, 4])
>>> sr.clip(lower=2, upper=3)
0    2
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=None, upper=3)
0    1
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True)
>>> sr
0    2
1    2
2    3
3    4
dtype: int64
copy(name=None, deep=False, dtype=None, names=None)

Make a copy of this object.

Parameters
nameobject, default None

Name of index, use original name when None

deepbool, default True

Make a deep copy of the data. With deep=False the original data is used

dtypenumpy dtype, default None

Target datatype to cast into, use original dtype when None

nameslist-like, default False

Kept compatibility with MultiIndex. Should not be used.

Returns
New index instance, casted to new dtype
cos()

Get Trigonometric cosine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.cos()
0    1.000000
1    0.947861
2    0.877583
3    0.525322
4   -0.448074
5   -0.598460
6   -0.283691
dtype: float64

cos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.cos()
      first    second
0  1.000000  0.862319
1  0.283662 -0.283691
2 -0.839072 -0.839039
3 -0.759688 -0.022097

cos operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.cos()
Float64Index([ 0.9210609940028851,  0.8623188722876839,
            -0.5984600690578581, -0.4480736161291701],
            dtype='float64')
difference(other, sort=None)

Return a new Index with elements from the index that are not in other.

This is the set difference of two Index objects.

Parameters
otherIndex or array-like
sortFalse or None, default None

Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.

  • None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.

  • False : Do not sort the result.

Returns
differenceIndex

Examples

>>> import cudf
>>> idx1 = cudf.Index([2, 1, 3, 4])
>>> idx1
Int64Index([2, 1, 3, 4], dtype='int64')
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx2
Int64Index([3, 4, 5, 6], dtype='int64')
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
drop_duplicates(keep='first')

Return Index with duplicate values removed

Parameters
keep{‘first’, ‘last’, False}, default ‘first’
  • ‘first’Drop duplicates except for the

    first occurrence.

  • ‘last’Drop duplicates except for the

    last occurrence.

  • False : Drop all duplicates.

Returns
deduplicatedIndex

Examples

>>> import cudf
>>> idx = cudf.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
>>> idx
StringIndex(['lama' 'cow' 'lama' 'beetle' 'lama' 'hippo'], dtype='object')
>>> idx.drop_duplicates()
StringIndex(['beetle' 'cow' 'hippo' 'lama'], dtype='object')
dropna(how='any')

Return an Index with null values removed.

Parameters
how{‘any’, ‘all’}, default ‘any’

If the Index is a MultiIndex, drop the value when any or all levels are NaN.

Returns
validIndex

Examples

>>> import cudf
>>> index = cudf.Index(['a', None, 'b', 'c'])
>>> index
StringIndex(['a' None 'b' 'c'], dtype='object')
>>> index.dropna()
StringIndex(['a' 'b' 'c'], dtype='object')

Using dropna on a MultiIndex:

>>> midx = cudf.MultiIndex(
...         levels=[[1, None, 4, None], [1, 2, 5]],
...         codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...         names=["x", "y"],
...     )
>>> midx
MultiIndex([(   1, 1),
            (   1, 5),
            (<NA>, 2),
            (   4, 2),
            (<NA>, 1)],
           names=['x', 'y'])
>>> midx.dropna()
MultiIndex([(1, 1),
            (1, 5),
            (4, 2)],
           names=['x', 'y'])
property dtype

dtype of the underlying values in GenericIndex.

property empty

Indicator whether Index is empty.

True if Index is entirely empty (no items).

Returns
outbool

If Index is empty, return True, if not return False.

Examples

>>> import cudf
>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.empty
True
equals(other, **kwargs)

Determine if two Index objects contain the same elements.

Returns
out: bool

True if “other” is an Index and it has the same elements as calling index; False otherwise.

exp()

Get the exponential of all elements, element-wise.

Exponential is the inverse of the log function, so that x.exp().log() = x

Returns
DataFrame/Series/Index

Result of the element-wise exponential.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.exp()
0    3.678794e-01
1    1.000000e+00
2    2.718282e+00
3    1.383117e+00
4    1.648721e+00
5    4.539993e-05
6    2.688117e+43
dtype: float64

exp operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.exp()
      first        second
0  0.367879      1.263644
1  0.000045      1.349859
2  1.648721  22026.465795

exp operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.exp()
Float64Index([0.36787944117144233,  1.4918246976412703,
              2.718281828459045, 1.0,  1.3498588075760032],
            dtype='float64')
factorize(na_sentinel=- 1)

Encode the input values as integer labels

See also

cudf.core.series.Series.factorize

Encode the input values of Series.

fillna(value, downcast=None)

Fill null values with the specified value.

Parameters
valuescalar

Scalar value to use to fill nulls. This value cannot be a list-likes.

downcastdict, default is None

This Parameter is currently NON-FUNCTIONAL.

Returns
filledIndex

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, None, 4])
>>> index
Int64Index([1, 2, null, 4], dtype='int64')
>>> index.fillna(3)
Int64Index([1, 2, 3, 4], dtype='int64')
find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

Returns
begin, end2-tuple of int

The starting index and the ending index. The last value occurs at end - 1 position.

classmethod from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

Parameters
arrayPyArrow Array/ChunkedArray

PyArrow Object which has to be converted to Index

Returns
cudf Index
Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pyarrow as pa
>>> cudf.Index.from_arrow(pa.array(["a", "b", None]))
StringIndex(['a' 'b' None], dtype='object')
classmethod from_pandas(index, nan_as_null=None)

Convert from a Pandas Index.

Parameters
indexPandas Index object

A Pandas Index object which has to be converted to cuDF Index.

nan_as_nullbool, Default None

If None/True, converts np.nan values to null values. If False, leaves np.nan values as is.

Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pandas as pd
>>> import numpy as np
>>> data = [10, 20, 30, np.nan]
>>> pdi = pd.Index(data)
>>> cudf.core.index.Index.from_pandas(pdi)
Float64Index([10.0, 20.0, 30.0, <NA>], dtype='float64')
>>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False)
Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
get_level_values(level)

Return an Index of values for requested level.

This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.

Parameters
levelint or str

It is either the integer position or the name of the level.

Returns
Index

Calling object, as there is only one level in the Index.

See also

cudf.core.multiindex.MultiIndex.get_level_values

Get values for a level of a MultiIndex.

Notes

For Index, level should be 0, since there are no multiple levels.

Examples

>>> import cudf
>>> idx = cudf.Index(["a", "b", "c"])
>>> idx.get_level_values(0)
StringIndex(['a' 'b' 'c'], dtype='object')
get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if side=='right') position of given label.

Parameters
labelobject
side{‘left’, ‘right’}
kind{‘ix’, ‘loc’, ‘getitem’}
Returns
int

Index of label.

property gpu_values

View the data as a numba device array object

interleave_columns()

Interleave Series columns of a table into a single column.

Converts the column major table cols into a row major column.

Parameters
colsinput Table containing columns to interleave.
Returns
The interleaved columns as a single column

Examples

>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']])
>>> df
0    [A1, A2, A3]
1    [B1, B2, B3]
>>> df.interleave_columns()
0    A1
1    B1
2    A2
3    B2
4    A3
5    B3
property is_monotonic

Alias for is_monotonic_increasing.

property is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

property is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

property is_unique

Return if the index has unique values.

isin(values)

Return a boolean array where the index values are in values.

Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.

Parameters
valuesset, list-like, Index

Sought values.

Returns
is_containedcupy array

CuPy array of boolean values.

Examples

>>> idx = cudf.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')

Check whether each index value in a list of values.

>>> idx.isin([1, 4])
array([ True, False, False])
isna()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
isnull()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
join(other, how='left', level=None, return_indexers=False, sort=False)

Compute join_index and indexers to conform data structures to the new index.

Parameters
otherIndex.
how{‘left’, ‘right’, ‘inner’, ‘outer’}
return_indexersbool, default False
sortbool, default False

Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).

Returns: index

Examples

>>> import cudf
>>> lhs = cudf.DataFrame(
...     {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b']
... ).index
>>> lhs
MultiIndex([(2, 3),
            (3, 4),
            (1, 2)],
           names=['a', 'b'])
>>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index
>>> rhs
Int64Index([1, 4, 3], dtype='int64', name='a')
>>> lhs.join(rhs, how='inner')
MultiIndex([(3, 4),
            (1, 2)],
           names=['a', 'b'])
log()

Get the natural logarithm of all elements, element-wise.

Natural logarithm is the inverse of the exp function, so that x.log().exp() = x

Returns
DataFrame/Series/Index

Result of the element-wise natural logarithm.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.log()
0         NaN
1        -inf
2    0.000000
3   -1.125963
4   -0.693147
5         NaN
6    4.605170
dtype: float64

log operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.log()
      first    second
0       NaN -1.452434
1       NaN -1.203973
2 -0.693147  2.302585

log operation on Index:

>>> index = cudf.Index([10, 11, 500.0])
>>> index
Float64Index([10.0, 11.0, 500.0], dtype='float64')
>>> index.log()
Float64Index([2.302585092994046, 2.3978952727983707,
            6.214608098422191], dtype='float64')
mask(cond, other=None, inplace=False)

Replace values where the condition is True.

Parameters
condbool Series/DataFrame, array-like

Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.

other: scalar, list of scalars, Series/DataFrame

Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.

DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.

Series expects only scalar or series like with same length

inplacebool, default False

Whether to perform the operation in place on the data.

Returns
Same type as caller

Examples

>>> import cudf
>>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]})
>>> df.mask(df % 2 == 0, [-1, -1])
   A  B
0  1  3
1 -1  5
2  5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.mask(ser > 2, 10)
0    10
1    10
2     2
3     1
4     0
dtype: int64
>>> ser.mask(ser > 2)
0    <NA>
1    <NA>
2       2
3       1
4       0
dtype: int64
max()

Return the maximum value of the Index.

Returns
scalar

Maximum value.

See also

cudf.core.index.Index.min

Return the minimum value in an Index.

cudf.core.series.Series.max

Return the maximum value in a Series.

cudf.core.dataframe.DataFrame.max

Return the maximum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.max()
3
memory_usage(deep=False)

Memory usage of the values.

Parameters
deepbool

Introspect the data deeply, interrogate object dtypes for system-level memory consumption.

Returns
bytes used
min()

Return the minimum value of the Index.

Returns
scalar

Minimum value.

See also

cudf.core.index.Index.max

Return the maximum value in an Index.

cudf.core.series.Series.min

Return the minimum value in a Series.

cudf.core.dataframe.DataFrame.min

Return the minimum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.min()
1
property name

Returns the name of the Index.

property names

Returns a tuple containing the name of the Index.

property ndim

Dimension of the data. Apart from MultiIndex ndim is always 1.

property nlevels

Number of levels.

notna()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
notnull()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

Parameters
funcfunction

Function to apply to the Series/DataFrame/Index. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame/Index.

argsiterable, optional

Positional arguments passed into func.

kwargsmapping, optional

A dictionary of keyword arguments passed into func.

Returns
objectthe return type of func.

Examples

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. Instead of writing

>>> func(g(h(df), arg1=a), arg2=b, arg3=c)

You can write

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe(func, arg2=b, arg3=c)
... )

If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose f takes its data as arg2:

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe((func, 'arg2'), arg1=a, arg3=c)
...  )
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)

Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.

Parameters
axis{0 or ‘index’, 1 or ‘columns’}, default 0

Index to direct ranking.

method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’

How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.

numeric_onlybool, optional

For DataFrame objects, rank only numeric columns if set to True.

na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’

How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.

ascendingbool, default True

Whether or not the elements should be ranked in ascending order.

pctbool, default False

Whether or not to display the returned rankings in percentile form.

Returns
same type as caller

Return a Series or DataFrame with data ranks as values.

rename(name, inplace=False)

Alter Index name.

Defaults to returning new index.

Parameters
namelabel

Name(s) to set.

Returns
Index

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3], name='one')
>>> index
Int64Index([1, 2, 3], dtype='int64', name='one')
>>> index.name
'one'
>>> renamed_index = index.rename('two')
>>> renamed_index
Int64Index([1, 2, 3], dtype='int64', name='two')
>>> renamed_index.name
'two'
repeat(repeats, axis=None)

Repeats elements consecutively.

Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.

Parameters
repeatsint, or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.

Returns
Series/DataFrame/Index

A newly created object of same type as caller with repeated elements.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]})
>>> df
   a   b
0  1  10
1  2  20
2  3  30
>>> df.repeat(3)
   a   b
0  1  10
0  1  10
0  1  10
1  2  20
1  2  20
1  2  20
2  3  30
2  3  30
2  3  30

Repeat on Series

>>> s = cudf.Series([0, 2])
>>> s
0    0
1    2
dtype: int64
>>> s.repeat([3, 4])
0    0
0    0
0    0
1    2
1    2
1    2
1    2
dtype: int64
>>> s.repeat(2)
0    0
0    0
1    2
1    2
dtype: int64

Repeat on Index

>>> index = cudf.Index([10, 22, 33, 55])
>>> index
Int64Index([10, 22, 33, 55], dtype='int64')
>>> index.repeat(5)
Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33,
            33, 33, 33, 33, 55, 55, 55, 55, 55],
        dtype='int64')
round(decimals=0)

Round a DataFrame to a variable number of decimal places.

Parameters
decimalsint, dict, Series

Number of decimal places to round each column to. If an int is given, round each column to the same number of places. Otherwise dict and Series round to variable numbers of places. Column names should be in the keys if decimals is a dict-like, or in the index if decimals is a Series. Any columns not included in decimals will be left as is. Elements of decimals which are not columns of the input will be ignored.

Returns
DataFrame

A DataFrame with the affected columns rounded to the specified number of decimal places.

Examples

>>> df = cudf.DataFrame(
        [(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
...     columns=['dogs', 'cats']
... )
>>> df
    dogs  cats
0  0.21  0.32
1  0.01  0.67
2  0.66  0.03
3  0.21  0.18

By providing an integer each column is rounded to the same number of decimal places

>>> df.round(1)
    dogs  cats
0   0.2   0.3
1   0.0   0.7
2   0.7   0.0
3   0.2   0.2

With a dict, the number of places for specific columns can be specified with the column names as key and the number of decimal places as value

>>> df.round({'dogs': 1, 'cats': 0})
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

Using a Series, the number of places for specific columns can be specified with the column names as index and the number of decimal places as value

>>> decimals = cudf.Series([0, 1], index=['cats', 'dogs'])
>>> df.round(decimals)
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters
nint, optional

Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

fracfloat, optional

Fraction of axis items to return. Cannot be used with n.

replacebool, default False

Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”

weightsstr or ndarray-like, optional

Only supported for axis=1/”columns”

random_stateint, numpy RandomState or None, default None

Seed for the random number generator (if int), or None. If None, a random seed will be chosen. if RandomState, seed will be extracted from current state.

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.

Returns
Series or DataFrame or Index

A new object of same type as caller containing n items randomly sampled from the caller object.

Examples

>>> import cudf as cudf
>>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}})
>>> df.sample(3)
   a
1  2
3  4
0  1
>>> sr = cudf.Series([1, 2, 3, 4, 5])
>>> sr.sample(10, replace=True)
1    4
3    1
2    4
0    5
0    1
4    5
4    1
0    2
0    3
3    2
dtype: int64
>>> df = cudf.DataFrame(
... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]})
>>> df.sample(2, axis=1)
   a  c
0  1  3
1  2  4
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)

Scatter to a list of dataframes.

Uses map_index to determine the destination of each row of the original DataFrame.

Parameters
map_indexSeries, str or list-like

Scatter assignment for each row

map_sizeint

Length of output list. Must be >= uniques in map_index

keep_indexbool

Conserve original index values for each row

Returns
A list of cudf.DataFrame objects.
searchsorted(values, side='left', ascending=True, na_position='last')

Find indices where elements should be inserted to maintain order

Parameters
valueFrame (Shape must be consistent with self)

Values to be hypothetically inserted into Self

sidestr {‘left’, ‘right’} optional, default ‘left‘

If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index

ascendingbool optional, default True

Sorted Frame is in ascending order (otherwise descending)

na_positionstr {‘last’, ‘first’} optional, default ‘last‘

Position of null values in sorted order

Returns
1-D cupy array of insertion points

Examples

>>> s = cudf.Series([1, 2, 3])
>>> s.searchsorted(4)
3
>>> s.searchsorted([0, 4])
array([0, 3], dtype=int32)
>>> s.searchsorted([1, 3], side='left')
array([0, 2], dtype=int32)
>>> s.searchsorted([1, 3], side='right')
array([1, 3], dtype=int32)

If the values are not monotonically sorted, wrong locations may be returned:

>>> s = cudf.Series([2, 1, 3])
>>> s.searchsorted(1)
0   # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]})
>>> df
   a   b
0  1  10
1  3  12
2  5  14
3  7  16
>>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6],
... 'b': [10, 11, 13, 15]})
>>> values_df
   a   b
0  0  10
1  2  17
2  5  13
3  6  15
>>> df.searchsorted(values_df, ascending=False)
array([4, 4, 4, 0], dtype=int32)
set_names(names, level=None, inplace=False)

Set Index or MultiIndex name. Able to set new names partially and by level.

Parameters
nameslabel or list of label

Name(s) to set.

levelint, label or list of int or label, optional

If the index is a MultiIndex, level(s) to set (None for all levels). Otherwise level must be None.

inplacebool, default False

Modifies the object directly, instead of creating a new Index or MultiIndex.

Returns
Index

The same type as the caller or None if inplace is True.

See also

cudf.core.index.Index.rename

Able to set new names without level.

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = cudf.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
            ('python', 2019),
            ( 'cobra', 2018),
            ( 'cobra', 2019)],
           )
>>> idx.names
FrozenList([None, None])
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx.names
FrozenList(['kind', 'year'])
>>> idx.set_names('species', level=0, inplace=True)
>>> idx.names
FrozenList(['species', 'year'])
property shape

Returns a tuple representing the dimensionality of the Index.

shift(periods=1, freq=None, axis=0, fill_value=None)

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.sin()
0    0.000000
1    0.318683
2    0.479426
3    0.850904
4    0.893997
5   -0.801153
6    0.958916
dtype: float64

sin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.sin()
      first    second
0  0.000000 -0.506366
1 -0.958924  0.958916
2 -0.544021 -0.544072
3  0.650288 -0.999756

sin operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.sin()
Float64Index([-0.3894183423086505, -0.5063656411097588,
            0.8011526357338306, 0.8939966636005579],
            dtype='float64')
property size

Return the number of elements in the underlying data.

Returns
sizeSize of the DataFrame / Index / Series / MultiIndex

Examples

Size of an empty dataframe is 0.

>>> import cudf
>>> df = cudf.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>> df.size
0
>>> df = cudf.DataFrame(index=[1, 2, 3])
>>> df
Empty DataFrame
Columns: []
Index: [1, 2, 3]
>>> df.size
0

DataFrame with values

>>> df = cudf.DataFrame({'a': [10, 11, 12],
...         'b': ['hello', 'rapids', 'ai']})
>>> df
    a       b
0  10   hello
1  11  rapids
2  12      ai
>>> df.size
6
>>> df.index
RangeIndex(start=0, stop=3)
>>> df.index.size
3

Size of an Index

>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.size
0
>>> index = cudf.Index([1, 2, 3, 10])
>>> index
Int64Index([1, 2, 3, 10], dtype='int64')
>>> index.size
4

Size of a MultiIndex

>>> midx = cudf.MultiIndex(
...                 levels=[["a", "b", "c", None], ["1", None, "5"]],
...                 codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...                 names=["x", "y"],
...             )
>>> midx
MultiIndex([( 'a',  '1'),
            ( 'a',  '5'),
            ( 'b', <NA>),
            ( 'c', <NA>),
            (<NA>,  '1')],
           names=['x', 'y'])
>>> midx.size
5
sort_values(return_indexer=False, ascending=True, key=None)

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

Parameters
return_indexerbool, default False

Should the indices that would sort the index be returned.

ascendingbool, default True

Should the index values be sorted in an ascending order.

keyNone, optional

This parameter is NON-FUNCTIONAL.

Returns
sorted_indexIndex

Sorted copy of the index.

indexercupy.ndarray, optional

The indices that the index itself was sorted by.

See also

cudf.core.series.Series.min

Sort values of a Series.

cudf.core.dataframe.DataFrame.sort_values

Sort values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')

Sort values in ascending order (default behavior).

>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')

Sort values in descending order, and also get the indices idx was sorted by.

>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2],
                                                    dtype=int32))

Sorting values in a MultiIndex:

>>> midx = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> midx
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> midx.sort_values()
MultiIndex([(-10,  1),
            (  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11)],
           names=['x', 'y'])
>>> midx.sort_values(ascending=False)
MultiIndex([(  4, 11),
            (  3, 11),
            (  1,  5),
            (  1,  1),
            (-10,  1)],
           names=['x', 'y'])
sqrt()

Get the non-negative square-root of all elements, element-wise.

Returns
DataFrame/Series/Index

Result of the non-negative square-root of each element.

Examples

>>> import cudf
>>> import cudf
>>> ser = cudf.Series([10, 25, 81, 1.0, 100])
>>> ser
0     10.0
1     25.0
2     81.0
3      1.0
4    100.0
dtype: float64
>>> ser.sqrt()
0     3.162278
1     5.000000
2     9.000000
3     1.000000
4    10.000000
dtype: float64

sqrt operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-10.0, 100, 625],
...                      'second': [1, 2, 0.4]})
>>> df
   first  second
0  -10.0     1.0
1  100.0     2.0
2  625.0     0.4
>>> df.sqrt()
   first    second
0    NaN  1.000000
1   10.0  1.414214
2   25.0  0.632456

sqrt operation on Index:

>>> index = cudf.Index([-10.0, 100, 625])
>>> index
Float64Index([-10.0, 100.0, 625.0], dtype='float64')
>>> index.sqrt()
Float64Index([nan, 10.0, 25.0], dtype='float64')
sum()

Return the sum of all values of the Index.

Returns
scalar

Sum of all values.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.sum()
6
take(indices)

Gather only the specific subset of indices

Parameters
indices: An array-like that maps to values contained in this Index.
tan()

Get Trigonometric tangent, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.tan()
0    0.000000
1    0.336213
2    0.546302
3    1.619775
4   -1.995200
5    1.338690
6   -3.380140
dtype: float64

tan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.tan()
      first     second
0  0.000000  -0.587214
1 -3.380515  -3.380140
2  0.648361   0.648446
3 -0.855993  45.244742

tan operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.tan()
Float64Index([-0.4227932187381618,  -0.587213915156929,
            -1.3386902103511544, -1.995200412208242],
            dtype='float64')
tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

Parameters
selfinput Table containing columns to interleave.
countNumber of times to tile “rows”. Must be non-negative.
Returns
The table containing the tiled “rows”.

Examples

>>> df  = Dataframe([[8, 4, 7], [5, 2, 3]])
>>> count = 2
>>> df.tile(df, count)
   0  1  2
0  8  4  7
1  5  2  3
0  8  4  7
1  5  2  3
to_array(fillna=None)

Get a dense numpy array for the data.

Parameters
fillnastr or None

Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.

Notes

if fillna is None, null values are skipped. Therefore, the output size could be smaller.

to_arrow()

Convert Index to PyArrow Array

Returns
PyArrow Array

Examples

>>> import cudf
>>> ind = cudf.Index(["a", "b", None])
>>> ind.to_arrow()
<pyarrow.lib.StringArray object at 0x7f796b0e7750>
[
  "a",
  "b",
  null
]
to_dlpack()

Converts a cuDF object into a DLPack tensor.

DLPack is an open-source memory tensor structure: dmlc/dlpack.

This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.

Parameters
cudf_objDataFrame, Series, Index, or Column
Returns
pycapsule_objPyCapsule

Output DLPack tensor pointer which is encapsulated in a PyCapsule object.

to_frame(index=True, name=None)

Create a DataFrame with a column containing this Index

Parameters
indexboolean, default True

Set the index of the returned DataFrame as the original Index

namestr, default None

Name to be used for the column

Returns
DataFrame

cudf DataFrame

to_pandas()

Convert to a Pandas Index.

Examples

>>> import cudf
>>> idx = cudf.Index([-3, 10, 15, 20])
>>> idx
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> idx.to_pandas()
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> type(idx.to_pandas())
<class 'pandas.core.indexes.numeric.Int64Index'>
>>> type(idx)
<class 'cudf.core.index.GenericIndex'>
to_series(index=None, name=None)

Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.

Parameters
indexIndex, optional

Index of resulting Series. If None, defaults to original index.

namestr, optional

Dame of resulting Series. If None, defaults to name of original index.

Returns
Series

The dtype will be based on the type of the Index values.

unique()

Return unique values in the index.

Returns
Index without duplicates
property values

Return an array representing the data in the Index.

Returns
arrayA cupy array of data in the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values
array([  1, -10, 100,  20])
>>> type(index.values)
<class 'cupy.core.core.ndarray'>
property values_host

Return a numpy representation of the Index.

Only the values in the Index will be returned.

Returns
outnumpy.ndarray

The values of the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values_host
array([  1, -10, 100,  20])
>>> type(index.values_host)
<class 'numpy.ndarray'>
where(cond, other=None)

Replace values where the condition is False.

Parameters
condbool array-like with the same length as self

Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.

other: scalar, or array-like

Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.

Returns
Same type as caller

Examples

>>> import cudf
>>> index = cudf.Index([4, 3, 2, 1, 0])
>>> index
Int64Index([4, 3, 2, 1, 0], dtype='int64')
>>> index.where(index > 2, 15)
Int64Index([4, 3, 15, 15, 15], dtype='int64')

CategoricalIndex

class cudf.core.index.CategoricalIndex(data=None, categories=None, ordered=None, dtype=None, copy=False, name=None)

Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.

Parameters
dataarray-like (1-dimensional)/ DataFrame

If it is a DataFrame, it will return a MultiIndex

dtypeNumPy dtype (default: object)

If dtype is None, we find the dtype that best fits the data.

copybool

Make a copy of input data.

nameobject

Name to be stored in the index.

tupleize_colsbool (default: True)

When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.

Returns
Index

cudf Index

Examples

>>> import cudf
>>> cudf.Index([1, 2, 3], dtype="uint64", name="a")
UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]}))
MultiIndex([(1, 2),
            (2, 3)],
          names=['a', 'b'])
Attributes
categories

The categories of this categorical.

codes

The category codes of this categorical.

dtype

dtype of the underlying values in GenericIndex.

empty

Indicator whether Index is empty.

gpu_values

View the data as a numba device array object

is_monotonic

Alias for is_monotonic_increasing.

is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

is_unique

Return if the index has unique values.

name

Returns the name of the Index.

names

Returns a tuple containing the name of the Index.

ndim

Dimension of the data.

nlevels

Number of levels.

shape

Returns a tuple representing the dimensionality of the Index.

size

Return the number of elements in the underlying data.

values

Return an array representing the data in the Index.

values_host

Return a numpy representation of the Index.

Methods

acos()

Get Trigonometric inverse cosine, element-wise.

any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

argsort([ascending])

Return the integer indices that would sort the index.

asin()

Get Trigonometric inverse sine, element-wise.

astype(dtype[, copy])

Create an Index with values cast to dtypes.

atan()

Get Trigonometric inverse tangent, element-wise.

clip([lower, upper, inplace, axis])

Trim values at input threshold(s).

copy([name, deep, dtype, names])

Make a copy of this object.

cos()

Get Trigonometric cosine, element-wise.

difference(other[, sort])

Return a new Index with elements from the index that are not in other.

drop_duplicates([keep])

Return Index with duplicate values removed

dropna([how])

Return an Index with null values removed.

equals(other, **kwargs)

Determine if two Index objects contain the same elements.

exp()

Get the exponential of all elements, element-wise.

factorize([na_sentinel])

Encode the input values as integer labels

fillna(value[, downcast])

Fill null values with the specified value.

find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

from_pandas(index[, nan_as_null])

Convert from a Pandas Index.

get_level_values(level)

Return an Index of values for requested level.

get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label.

interleave_columns()

Interleave Series columns of a table into a single column.

isin(values)

Return a boolean array where the index values are in values.

isna()

Identify missing values.

isnull()

Identify missing values.

join(other[, how, level, return_indexers, sort])

Compute join_index and indexers to conform data structures to the new index.

log()

Get the natural logarithm of all elements, element-wise.

mask(cond[, other, inplace])

Replace values where the condition is True.

max()

Return the maximum value of the Index.

memory_usage([deep])

Memory usage of the values.

min()

Return the minimum value of the Index.

notna()

Identify non-missing values.

notnull()

Identify non-missing values.

pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

rank([axis, method, numeric_only, …])

Compute numerical data ranks (1 through n) along axis.

rename(name[, inplace])

Alter Index name.

repeat(repeats[, axis])

Repeats elements consecutively.

round([decimals])

Round a DataFrame to a variable number of decimal places.

sample([n, frac, replace, weights, …])

Return a random sample of items from an axis of object.

scatter_by_map(map_index[, map_size, keep_index])

Scatter to a list of dataframes.

searchsorted(values[, side, ascending, …])

Find indices where elements should be inserted to maintain order

set_names(names[, level, inplace])

Set Index or MultiIndex name.

shift([periods, freq, axis, fill_value])

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

sort_values([return_indexer, ascending, key])

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

sqrt()

Get the non-negative square-root of all elements, element-wise.

sum()

Return the sum of all values of the Index.

take(indices)

Gather only the specific subset of indices

tan()

Get Trigonometric tangent, element-wise.

tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

to_array([fillna])

Get a dense numpy array for the data.

to_arrow()

Convert Index to PyArrow Array

to_dlpack()

Converts a cuDF object into a DLPack tensor.

to_frame([index, name])

Create a DataFrame with a column containing this Index

to_pandas()

Convert to a Pandas Index.

to_series([index, name])

Create a Series with both index and values equal to the index keys.

unique()

Return unique values in the index.

where(cond[, other])

Replace values where the condition is False.

replace

acos()

Get Trigonometric inverse cosine, element-wise.

The inverse of cos so that, if y = x.cos(), then x = y.acos()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.acos()
0    3.141593
1    1.570796
2    0.000000
3    1.240482
4    1.047198
dtype: float64

acos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.acos()
      first    second
0  3.141593  1.334606
1  1.570796  1.266104
2  1.047198  1.470629

acos operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.acos()
Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0,
            1.5707963267948966,  1.266103672779499],
            dtype='float64')
any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

Parameters
otherIndex or list/tuple of indices
Returns
appendedIndex

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 10, 100])
>>> idx
Int64Index([1, 2, 10, 100], dtype='int64')
>>> other = cudf.Index([200, 400, 50])
>>> other
Int64Index([200, 400, 50], dtype='int64')
>>> idx.append(other)
Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')

append accepts list of Index objects

>>> idx.append([other, other])
Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
argsort(ascending=True, **kwargs)

Return the integer indices that would sort the index.

Parameters
ascendingbool, default True

If True, returns the indices for ascending order. If False, returns the indices for descending order.

Returns
arrayA cupy array containing Integer indices that

would sort the index if used as an indexer.

Examples

>>> import cudf
>>> index = cudf.Index([10, 100, 1, 1000])
>>> index
Int64Index([10, 100, 1, 1000], dtype='int64')
>>> index.argsort()
array([2, 0, 1, 3], dtype=int32)

The order of argsort can be reversed using ascending parameter, by setting it to False. >>> index.argsort(ascending=False) array([3, 1, 0, 2], dtype=int32)

argsort on a MultiIndex:

>>> index = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> index
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> index.argsort()
array([4, 0, 1, 2, 3], dtype=int32)
>>> index.argsort(ascending=False)
array([3, 2, 1, 0, 4], dtype=int32)
asin()

Get Trigonometric inverse sine, element-wise.

The inverse of sine so that, if y = x.sin(), then x = y.asin()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.asin()
0   -1.570796
1    0.000000
2    1.570796
3    0.330314
4    0.523599
dtype: float64

asin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.asin()
      first    second
0 -1.570796  0.236190
1  0.000000  0.304693
2  0.523599  0.100167

asin operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64')
>>> index.asin()
Float64Index([-1.5707963267948966, 0.41151684606748806,
            1.5707963267948966, 0.3046926540153975],
            dtype='float64')
astype(dtype, copy=False)

Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.

Parameters
dtypenumpy dtype

Use a numpy.dtype to cast entire Index object to.

copybool, default False

By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.

Returns
Index

Index with values cast to specified dtype.

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3])
>>> index
Int64Index([1, 2, 3], dtype='int64')
>>> index.astype('float64')
Float64Index([1.0, 2.0, 3.0], dtype='float64')
atan()

Get Trigonometric inverse tangent, element-wise.

The inverse of tan so that, if y = x.tan(), then x = y.atan()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10])
>>> ser
0    -1.00000
1     0.00000
2     1.00000
3     0.32434
4     0.50000
5   -10.00000
dtype: float64
>>> ser.atan()
0   -0.785398
1    0.000000
2    0.785398
3    0.313635
4    0.463648
5   -1.471128
dtype: float64

atan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.atan()
      first    second
0 -0.785398  0.229864
1 -1.471128  0.291457
2  0.463648  1.471128

atan operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.atan()
Float64Index([-0.7853981633974483,  0.3805063771123649,
                            0.7853981633974483, 0.0,
                            0.2914567944778671],
            dtype='float64')
property categories

The categories of this categorical.

clip(lower=None, upper=None, inplace=False, axis=1)

Trim values at input threshold(s).

Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.

Parameters
lowerscalar or array_like, default None

Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.

upperscalar or array_like, default None

Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.

inplacebool, default False
Returns
Clipped DataFrame/Series/Index/MultiIndex

Examples

>>> import cudf
>>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']})
>>> df.clip(lower=[2, 'b'], upper=[3, 'c'])
   a  b
0  2  b
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=None, upper=[3, 'c'])
   a  b
0  1  a
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=[2, 'b'], upper=None)
   a  b
0  2  b
1  2  b
2  3  c
3  4  d
>>> df.clip(lower=2, upper=3, inplace=True)
>>> df
   a  b
0  2  2
1  2  3
2  3  3
3  3  3
>>> import cudf
>>> sr = cudf.Series([1, 2, 3, 4])
>>> sr.clip(lower=2, upper=3)
0    2
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=None, upper=3)
0    1
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True)
>>> sr
0    2
1    2
2    3
3    4
dtype: int64
property codes

The category codes of this categorical.

copy(name=None, deep=False, dtype=None, names=None)

Make a copy of this object.

Parameters
nameobject, default None

Name of index, use original name when None

deepbool, default True

Make a deep copy of the data. With deep=False the original data is used

dtypenumpy dtype, default None

Target datatype to cast into, use original dtype when None

nameslist-like, default False

Kept compatibility with MultiIndex. Should not be used.

Returns
New index instance, casted to new dtype
cos()

Get Trigonometric cosine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.cos()
0    1.000000
1    0.947861
2    0.877583
3    0.525322
4   -0.448074
5   -0.598460
6   -0.283691
dtype: float64

cos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.cos()
      first    second
0  1.000000  0.862319
1  0.283662 -0.283691
2 -0.839072 -0.839039
3 -0.759688 -0.022097

cos operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.cos()
Float64Index([ 0.9210609940028851,  0.8623188722876839,
            -0.5984600690578581, -0.4480736161291701],
            dtype='float64')
difference(other, sort=None)

Return a new Index with elements from the index that are not in other.

This is the set difference of two Index objects.

Parameters
otherIndex or array-like
sortFalse or None, default None

Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.

  • None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.

  • False : Do not sort the result.

Returns
differenceIndex

Examples

>>> import cudf
>>> idx1 = cudf.Index([2, 1, 3, 4])
>>> idx1
Int64Index([2, 1, 3, 4], dtype='int64')
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx2
Int64Index([3, 4, 5, 6], dtype='int64')
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
drop_duplicates(keep='first')

Return Index with duplicate values removed

Parameters
keep{‘first’, ‘last’, False}, default ‘first’
  • ‘first’Drop duplicates except for the

    first occurrence.

  • ‘last’Drop duplicates except for the

    last occurrence.

  • False : Drop all duplicates.

Returns
deduplicatedIndex

Examples

>>> import cudf
>>> idx = cudf.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
>>> idx
StringIndex(['lama' 'cow' 'lama' 'beetle' 'lama' 'hippo'], dtype='object')
>>> idx.drop_duplicates()
StringIndex(['beetle' 'cow' 'hippo' 'lama'], dtype='object')
dropna(how='any')

Return an Index with null values removed.

Parameters
how{‘any’, ‘all’}, default ‘any’

If the Index is a MultiIndex, drop the value when any or all levels are NaN.

Returns
validIndex

Examples

>>> import cudf
>>> index = cudf.Index(['a', None, 'b', 'c'])
>>> index
StringIndex(['a' None 'b' 'c'], dtype='object')
>>> index.dropna()
StringIndex(['a' 'b' 'c'], dtype='object')

Using dropna on a MultiIndex:

>>> midx = cudf.MultiIndex(
...         levels=[[1, None, 4, None], [1, 2, 5]],
...         codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...         names=["x", "y"],
...     )
>>> midx
MultiIndex([(   1, 1),
            (   1, 5),
            (<NA>, 2),
            (   4, 2),
            (<NA>, 1)],
           names=['x', 'y'])
>>> midx.dropna()
MultiIndex([(1, 1),
            (1, 5),
            (4, 2)],
           names=['x', 'y'])
property dtype

dtype of the underlying values in GenericIndex.

property empty

Indicator whether Index is empty.

True if Index is entirely empty (no items).

Returns
outbool

If Index is empty, return True, if not return False.

Examples

>>> import cudf
>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.empty
True
equals(other, **kwargs)

Determine if two Index objects contain the same elements.

Returns
out: bool

True if “other” is an Index and it has the same elements as calling index; False otherwise.

exp()

Get the exponential of all elements, element-wise.

Exponential is the inverse of the log function, so that x.exp().log() = x

Returns
DataFrame/Series/Index

Result of the element-wise exponential.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.exp()
0    3.678794e-01
1    1.000000e+00
2    2.718282e+00
3    1.383117e+00
4    1.648721e+00
5    4.539993e-05
6    2.688117e+43
dtype: float64

exp operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.exp()
      first        second
0  0.367879      1.263644
1  0.000045      1.349859
2  1.648721  22026.465795

exp operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.exp()
Float64Index([0.36787944117144233,  1.4918246976412703,
              2.718281828459045, 1.0,  1.3498588075760032],
            dtype='float64')
factorize(na_sentinel=- 1)

Encode the input values as integer labels

See also

cudf.core.series.Series.factorize

Encode the input values of Series.

fillna(value, downcast=None)

Fill null values with the specified value.

Parameters
valuescalar

Scalar value to use to fill nulls. This value cannot be a list-likes.

downcastdict, default is None

This Parameter is currently NON-FUNCTIONAL.

Returns
filledIndex

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, None, 4])
>>> index
Int64Index([1, 2, null, 4], dtype='int64')
>>> index.fillna(3)
Int64Index([1, 2, 3, 4], dtype='int64')
find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

Returns
begin, end2-tuple of int

The starting index and the ending index. The last value occurs at end - 1 position.

classmethod from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

Parameters
arrayPyArrow Array/ChunkedArray

PyArrow Object which has to be converted to Index

Returns
cudf Index
Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pyarrow as pa
>>> cudf.Index.from_arrow(pa.array(["a", "b", None]))
StringIndex(['a' 'b' None], dtype='object')
classmethod from_pandas(index, nan_as_null=None)

Convert from a Pandas Index.

Parameters
indexPandas Index object

A Pandas Index object which has to be converted to cuDF Index.

nan_as_nullbool, Default None

If None/True, converts np.nan values to null values. If False, leaves np.nan values as is.

Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pandas as pd
>>> import numpy as np
>>> data = [10, 20, 30, np.nan]
>>> pdi = pd.Index(data)
>>> cudf.core.index.Index.from_pandas(pdi)
Float64Index([10.0, 20.0, 30.0, <NA>], dtype='float64')
>>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False)
Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
get_level_values(level)

Return an Index of values for requested level.

This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.

Parameters
levelint or str

It is either the integer position or the name of the level.

Returns
Index

Calling object, as there is only one level in the Index.

See also

cudf.core.multiindex.MultiIndex.get_level_values

Get values for a level of a MultiIndex.

Notes

For Index, level should be 0, since there are no multiple levels.

Examples

>>> import cudf
>>> idx = cudf.Index(["a", "b", "c"])
>>> idx.get_level_values(0)
StringIndex(['a' 'b' 'c'], dtype='object')
get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if side=='right') position of given label.

Parameters
labelobject
side{‘left’, ‘right’}
kind{‘ix’, ‘loc’, ‘getitem’}
Returns
int

Index of label.

property gpu_values

View the data as a numba device array object

interleave_columns()

Interleave Series columns of a table into a single column.

Converts the column major table cols into a row major column.

Parameters
colsinput Table containing columns to interleave.
Returns
The interleaved columns as a single column

Examples

>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']])
>>> df
0    [A1, A2, A3]
1    [B1, B2, B3]
>>> df.interleave_columns()
0    A1
1    B1
2    A2
3    B2
4    A3
5    B3
property is_monotonic

Alias for is_monotonic_increasing.

property is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

property is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

property is_unique

Return if the index has unique values.

isin(values)

Return a boolean array where the index values are in values.

Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.

Parameters
valuesset, list-like, Index

Sought values.

Returns
is_containedcupy array

CuPy array of boolean values.

Examples

>>> idx = cudf.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')

Check whether each index value in a list of values.

>>> idx.isin([1, 4])
array([ True, False, False])
isna()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
isnull()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
join(other, how='left', level=None, return_indexers=False, sort=False)

Compute join_index and indexers to conform data structures to the new index.

Parameters
otherIndex.
how{‘left’, ‘right’, ‘inner’, ‘outer’}
return_indexersbool, default False
sortbool, default False

Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).

Returns: index

Examples

>>> import cudf
>>> lhs = cudf.DataFrame(
...     {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b']
... ).index
>>> lhs
MultiIndex([(2, 3),
            (3, 4),
            (1, 2)],
           names=['a', 'b'])
>>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index
>>> rhs
Int64Index([1, 4, 3], dtype='int64', name='a')
>>> lhs.join(rhs, how='inner')
MultiIndex([(3, 4),
            (1, 2)],
           names=['a', 'b'])
log()

Get the natural logarithm of all elements, element-wise.

Natural logarithm is the inverse of the exp function, so that x.log().exp() = x

Returns
DataFrame/Series/Index

Result of the element-wise natural logarithm.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.log()
0         NaN
1        -inf
2    0.000000
3   -1.125963
4   -0.693147
5         NaN
6    4.605170
dtype: float64

log operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.log()
      first    second
0       NaN -1.452434
1       NaN -1.203973
2 -0.693147  2.302585

log operation on Index:

>>> index = cudf.Index([10, 11, 500.0])
>>> index
Float64Index([10.0, 11.0, 500.0], dtype='float64')
>>> index.log()
Float64Index([2.302585092994046, 2.3978952727983707,
            6.214608098422191], dtype='float64')
mask(cond, other=None, inplace=False)

Replace values where the condition is True.

Parameters
condbool Series/DataFrame, array-like

Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.

other: scalar, list of scalars, Series/DataFrame

Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.

DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.

Series expects only scalar or series like with same length

inplacebool, default False

Whether to perform the operation in place on the data.

Returns
Same type as caller

Examples

>>> import cudf
>>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]})
>>> df.mask(df % 2 == 0, [-1, -1])
   A  B
0  1  3
1 -1  5
2  5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.mask(ser > 2, 10)
0    10
1    10
2     2
3     1
4     0
dtype: int64
>>> ser.mask(ser > 2)
0    <NA>
1    <NA>
2       2
3       1
4       0
dtype: int64
max()

Return the maximum value of the Index.

Returns
scalar

Maximum value.

See also

cudf.core.index.Index.min

Return the minimum value in an Index.

cudf.core.series.Series.max

Return the maximum value in a Series.

cudf.core.dataframe.DataFrame.max

Return the maximum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.max()
3
memory_usage(deep=False)

Memory usage of the values.

Parameters
deepbool

Introspect the data deeply, interrogate object dtypes for system-level memory consumption.

Returns
bytes used
min()

Return the minimum value of the Index.

Returns
scalar

Minimum value.

See also

cudf.core.index.Index.max

Return the maximum value in an Index.

cudf.core.series.Series.min

Return the minimum value in a Series.

cudf.core.dataframe.DataFrame.min

Return the minimum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.min()
1
property name

Returns the name of the Index.

property names

Returns a tuple containing the name of the Index.

property ndim

Dimension of the data. Apart from MultiIndex ndim is always 1.

property nlevels

Number of levels.

notna()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
notnull()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

Parameters
funcfunction

Function to apply to the Series/DataFrame/Index. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame/Index.

argsiterable, optional

Positional arguments passed into func.

kwargsmapping, optional

A dictionary of keyword arguments passed into func.

Returns
objectthe return type of func.

Examples

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. Instead of writing

>>> func(g(h(df), arg1=a), arg2=b, arg3=c)

You can write

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe(func, arg2=b, arg3=c)
... )

If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose f takes its data as arg2:

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe((func, 'arg2'), arg1=a, arg3=c)
...  )
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)

Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.

Parameters
axis{0 or ‘index’, 1 or ‘columns’}, default 0

Index to direct ranking.

method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’

How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.

numeric_onlybool, optional

For DataFrame objects, rank only numeric columns if set to True.

na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’

How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.

ascendingbool, default True

Whether or not the elements should be ranked in ascending order.

pctbool, default False

Whether or not to display the returned rankings in percentile form.

Returns
same type as caller

Return a Series or DataFrame with data ranks as values.

rename(name, inplace=False)

Alter Index name.

Defaults to returning new index.

Parameters
namelabel

Name(s) to set.

Returns
Index

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3], name='one')
>>> index
Int64Index([1, 2, 3], dtype='int64', name='one')
>>> index.name
'one'
>>> renamed_index = index.rename('two')
>>> renamed_index
Int64Index([1, 2, 3], dtype='int64', name='two')
>>> renamed_index.name
'two'
repeat(repeats, axis=None)

Repeats elements consecutively.

Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.

Parameters
repeatsint, or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.

Returns
Series/DataFrame/Index

A newly created object of same type as caller with repeated elements.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]})
>>> df
   a   b
0  1  10
1  2  20
2  3  30
>>> df.repeat(3)
   a   b
0  1  10
0  1  10
0  1  10
1  2  20
1  2  20
1  2  20
2  3  30
2  3  30
2  3  30

Repeat on Series

>>> s = cudf.Series([0, 2])
>>> s
0    0
1    2
dtype: int64
>>> s.repeat([3, 4])
0    0
0    0
0    0
1    2
1    2
1    2
1    2
dtype: int64
>>> s.repeat(2)
0    0
0    0
1    2
1    2
dtype: int64

Repeat on Index

>>> index = cudf.Index([10, 22, 33, 55])
>>> index
Int64Index([10, 22, 33, 55], dtype='int64')
>>> index.repeat(5)
Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33,
            33, 33, 33, 33, 55, 55, 55, 55, 55],
        dtype='int64')
round(decimals=0)

Round a DataFrame to a variable number of decimal places.

Parameters
decimalsint, dict, Series

Number of decimal places to round each column to. If an int is given, round each column to the same number of places. Otherwise dict and Series round to variable numbers of places. Column names should be in the keys if decimals is a dict-like, or in the index if decimals is a Series. Any columns not included in decimals will be left as is. Elements of decimals which are not columns of the input will be ignored.

Returns
DataFrame

A DataFrame with the affected columns rounded to the specified number of decimal places.

Examples

>>> df = cudf.DataFrame(
        [(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
...     columns=['dogs', 'cats']
... )
>>> df
    dogs  cats
0  0.21  0.32
1  0.01  0.67
2  0.66  0.03
3  0.21  0.18

By providing an integer each column is rounded to the same number of decimal places

>>> df.round(1)
    dogs  cats
0   0.2   0.3
1   0.0   0.7
2   0.7   0.0
3   0.2   0.2

With a dict, the number of places for specific columns can be specified with the column names as key and the number of decimal places as value

>>> df.round({'dogs': 1, 'cats': 0})
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

Using a Series, the number of places for specific columns can be specified with the column names as index and the number of decimal places as value

>>> decimals = cudf.Series([0, 1], index=['cats', 'dogs'])
>>> df.round(decimals)
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters
nint, optional

Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

fracfloat, optional

Fraction of axis items to return. Cannot be used with n.

replacebool, default False

Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”

weightsstr or ndarray-like, optional

Only supported for axis=1/”columns”

random_stateint, numpy RandomState or None, default None

Seed for the random number generator (if int), or None. If None, a random seed will be chosen. if RandomState, seed will be extracted from current state.

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.

Returns
Series or DataFrame or Index

A new object of same type as caller containing n items randomly sampled from the caller object.

Examples

>>> import cudf as cudf
>>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}})
>>> df.sample(3)
   a
1  2
3  4
0  1
>>> sr = cudf.Series([1, 2, 3, 4, 5])
>>> sr.sample(10, replace=True)
1    4
3    1
2    4
0    5
0    1
4    5
4    1
0    2
0    3
3    2
dtype: int64
>>> df = cudf.DataFrame(
... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]})
>>> df.sample(2, axis=1)
   a  c
0  1  3
1  2  4
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)

Scatter to a list of dataframes.

Uses map_index to determine the destination of each row of the original DataFrame.

Parameters
map_indexSeries, str or list-like

Scatter assignment for each row

map_sizeint

Length of output list. Must be >= uniques in map_index

keep_indexbool

Conserve original index values for each row

Returns
A list of cudf.DataFrame objects.
searchsorted(values, side='left', ascending=True, na_position='last')

Find indices where elements should be inserted to maintain order

Parameters
valueFrame (Shape must be consistent with self)

Values to be hypothetically inserted into Self

sidestr {‘left’, ‘right’} optional, default ‘left‘

If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index

ascendingbool optional, default True

Sorted Frame is in ascending order (otherwise descending)

na_positionstr {‘last’, ‘first’} optional, default ‘last‘

Position of null values in sorted order

Returns
1-D cupy array of insertion points

Examples

>>> s = cudf.Series([1, 2, 3])
>>> s.searchsorted(4)
3
>>> s.searchsorted([0, 4])
array([0, 3], dtype=int32)
>>> s.searchsorted([1, 3], side='left')
array([0, 2], dtype=int32)
>>> s.searchsorted([1, 3], side='right')
array([1, 3], dtype=int32)

If the values are not monotonically sorted, wrong locations may be returned:

>>> s = cudf.Series([2, 1, 3])
>>> s.searchsorted(1)
0   # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]})
>>> df
   a   b
0  1  10
1  3  12
2  5  14
3  7  16
>>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6],
... 'b': [10, 11, 13, 15]})
>>> values_df
   a   b
0  0  10
1  2  17
2  5  13
3  6  15
>>> df.searchsorted(values_df, ascending=False)
array([4, 4, 4, 0], dtype=int32)
set_names(names, level=None, inplace=False)

Set Index or MultiIndex name. Able to set new names partially and by level.

Parameters
nameslabel or list of label

Name(s) to set.

levelint, label or list of int or label, optional

If the index is a MultiIndex, level(s) to set (None for all levels). Otherwise level must be None.

inplacebool, default False

Modifies the object directly, instead of creating a new Index or MultiIndex.

Returns
Index

The same type as the caller or None if inplace is True.

See also

cudf.core.index.Index.rename

Able to set new names without level.

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = cudf.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
            ('python', 2019),
            ( 'cobra', 2018),
            ( 'cobra', 2019)],
           )
>>> idx.names
FrozenList([None, None])
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx.names
FrozenList(['kind', 'year'])
>>> idx.set_names('species', level=0, inplace=True)
>>> idx.names
FrozenList(['species', 'year'])
property shape

Returns a tuple representing the dimensionality of the Index.

shift(periods=1, freq=None, axis=0, fill_value=None)

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.sin()
0    0.000000
1    0.318683
2    0.479426
3    0.850904
4    0.893997
5   -0.801153
6    0.958916
dtype: float64

sin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.sin()
      first    second
0  0.000000 -0.506366
1 -0.958924  0.958916
2 -0.544021 -0.544072
3  0.650288 -0.999756

sin operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.sin()
Float64Index([-0.3894183423086505, -0.5063656411097588,
            0.8011526357338306, 0.8939966636005579],
            dtype='float64')
property size

Return the number of elements in the underlying data.

Returns
sizeSize of the DataFrame / Index / Series / MultiIndex

Examples

Size of an empty dataframe is 0.

>>> import cudf
>>> df = cudf.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>> df.size
0
>>> df = cudf.DataFrame(index=[1, 2, 3])
>>> df
Empty DataFrame
Columns: []
Index: [1, 2, 3]
>>> df.size
0

DataFrame with values

>>> df = cudf.DataFrame({'a': [10, 11, 12],
...         'b': ['hello', 'rapids', 'ai']})
>>> df
    a       b
0  10   hello
1  11  rapids
2  12      ai
>>> df.size
6
>>> df.index
RangeIndex(start=0, stop=3)
>>> df.index.size
3

Size of an Index

>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.size
0
>>> index = cudf.Index([1, 2, 3, 10])
>>> index
Int64Index([1, 2, 3, 10], dtype='int64')
>>> index.size
4

Size of a MultiIndex

>>> midx = cudf.MultiIndex(
...                 levels=[["a", "b", "c", None], ["1", None, "5"]],
...                 codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...                 names=["x", "y"],
...             )
>>> midx
MultiIndex([( 'a',  '1'),
            ( 'a',  '5'),
            ( 'b', <NA>),
            ( 'c', <NA>),
            (<NA>,  '1')],
           names=['x', 'y'])
>>> midx.size
5
sort_values(return_indexer=False, ascending=True, key=None)

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

Parameters
return_indexerbool, default False

Should the indices that would sort the index be returned.

ascendingbool, default True

Should the index values be sorted in an ascending order.

keyNone, optional

This parameter is NON-FUNCTIONAL.

Returns
sorted_indexIndex

Sorted copy of the index.

indexercupy.ndarray, optional

The indices that the index itself was sorted by.

See also

cudf.core.series.Series.min

Sort values of a Series.

cudf.core.dataframe.DataFrame.sort_values

Sort values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')

Sort values in ascending order (default behavior).

>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')

Sort values in descending order, and also get the indices idx was sorted by.

>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2],
                                                    dtype=int32))

Sorting values in a MultiIndex:

>>> midx = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> midx
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> midx.sort_values()
MultiIndex([(-10,  1),
            (  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11)],
           names=['x', 'y'])
>>> midx.sort_values(ascending=False)
MultiIndex([(  4, 11),
            (  3, 11),
            (  1,  5),
            (  1,  1),
            (-10,  1)],
           names=['x', 'y'])
sqrt()

Get the non-negative square-root of all elements, element-wise.

Returns
DataFrame/Series/Index

Result of the non-negative square-root of each element.

Examples

>>> import cudf
>>> import cudf
>>> ser = cudf.Series([10, 25, 81, 1.0, 100])
>>> ser
0     10.0
1     25.0
2     81.0
3      1.0
4    100.0
dtype: float64
>>> ser.sqrt()
0     3.162278
1     5.000000
2     9.000000
3     1.000000
4    10.000000
dtype: float64

sqrt operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-10.0, 100, 625],
...                      'second': [1, 2, 0.4]})
>>> df
   first  second
0  -10.0     1.0
1  100.0     2.0
2  625.0     0.4
>>> df.sqrt()
   first    second
0    NaN  1.000000
1   10.0  1.414214
2   25.0  0.632456

sqrt operation on Index:

>>> index = cudf.Index([-10.0, 100, 625])
>>> index
Float64Index([-10.0, 100.0, 625.0], dtype='float64')
>>> index.sqrt()
Float64Index([nan, 10.0, 25.0], dtype='float64')
sum()

Return the sum of all values of the Index.

Returns
scalar

Sum of all values.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.sum()
6
take(indices)

Gather only the specific subset of indices

Parameters
indices: An array-like that maps to values contained in this Index.
tan()

Get Trigonometric tangent, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.tan()
0    0.000000
1    0.336213
2    0.546302
3    1.619775
4   -1.995200
5    1.338690
6   -3.380140
dtype: float64

tan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.tan()
      first     second
0  0.000000  -0.587214
1 -3.380515  -3.380140
2  0.648361   0.648446
3 -0.855993  45.244742

tan operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.tan()
Float64Index([-0.4227932187381618,  -0.587213915156929,
            -1.3386902103511544, -1.995200412208242],
            dtype='float64')
tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

Parameters
selfinput Table containing columns to interleave.
countNumber of times to tile “rows”. Must be non-negative.
Returns
The table containing the tiled “rows”.

Examples

>>> df  = Dataframe([[8, 4, 7], [5, 2, 3]])
>>> count = 2
>>> df.tile(df, count)
   0  1  2
0  8  4  7
1  5  2  3
0  8  4  7
1  5  2  3
to_array(fillna=None)

Get a dense numpy array for the data.

Parameters
fillnastr or None

Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.

Notes

if fillna is None, null values are skipped. Therefore, the output size could be smaller.

to_arrow()

Convert Index to PyArrow Array

Returns
PyArrow Array

Examples

>>> import cudf
>>> ind = cudf.Index(["a", "b", None])
>>> ind.to_arrow()
<pyarrow.lib.StringArray object at 0x7f796b0e7750>
[
  "a",
  "b",
  null
]
to_dlpack()

Converts a cuDF object into a DLPack tensor.

DLPack is an open-source memory tensor structure: dmlc/dlpack.

This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.

Parameters
cudf_objDataFrame, Series, Index, or Column
Returns
pycapsule_objPyCapsule

Output DLPack tensor pointer which is encapsulated in a PyCapsule object.

to_frame(index=True, name=None)

Create a DataFrame with a column containing this Index

Parameters
indexboolean, default True

Set the index of the returned DataFrame as the original Index

namestr, default None

Name to be used for the column

Returns
DataFrame

cudf DataFrame

to_pandas()

Convert to a Pandas Index.

Examples

>>> import cudf
>>> idx = cudf.Index([-3, 10, 15, 20])
>>> idx
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> idx.to_pandas()
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> type(idx.to_pandas())
<class 'pandas.core.indexes.numeric.Int64Index'>
>>> type(idx)
<class 'cudf.core.index.GenericIndex'>
to_series(index=None, name=None)

Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.

Parameters
indexIndex, optional

Index of resulting Series. If None, defaults to original index.

namestr, optional

Dame of resulting Series. If None, defaults to name of original index.

Returns
Series

The dtype will be based on the type of the Index values.

unique()

Return unique values in the index.

Returns
Index without duplicates
property values

Return an array representing the data in the Index.

Returns
arrayA cupy array of data in the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values
array([  1, -10, 100,  20])
>>> type(index.values)
<class 'cupy.core.core.ndarray'>
property values_host

Return a numpy representation of the Index.

Only the values in the Index will be returned.

Returns
outnumpy.ndarray

The values of the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values_host
array([  1, -10, 100,  20])
>>> type(index.values_host)
<class 'numpy.ndarray'>
where(cond, other=None)

Replace values where the condition is False.

Parameters
condbool array-like with the same length as self

Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.

other: scalar, or array-like

Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.

Returns
Same type as caller

Examples

>>> import cudf
>>> index = cudf.Index([4, 3, 2, 1, 0])
>>> index
Int64Index([4, 3, 2, 1, 0], dtype='int64')
>>> index.where(index > 2, 15)
Int64Index([4, 3, 15, 15, 15], dtype='int64')

StringIndex

class cudf.core.index.StringIndex(values, copy=False, **kwargs)

Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.

Parameters
dataarray-like (1-dimensional)/ DataFrame

If it is a DataFrame, it will return a MultiIndex

dtypeNumPy dtype (default: object)

If dtype is None, we find the dtype that best fits the data.

copybool

Make a copy of input data.

nameobject

Name to be stored in the index.

tupleize_colsbool (default: True)

When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.

Returns
Index

cudf Index

Examples

>>> import cudf
>>> cudf.Index([1, 2, 3], dtype="uint64", name="a")
UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]}))
MultiIndex([(1, 2),
            (2, 3)],
          names=['a', 'b'])
Attributes
dtype

dtype of the underlying values in GenericIndex.

empty

Indicator whether Index is empty.

gpu_values

View the data as a numba device array object

is_monotonic

Alias for is_monotonic_increasing.

is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

is_unique

Return if the index has unique values.

name

Returns the name of the Index.

names

Returns a tuple containing the name of the Index.

ndim

Dimension of the data.

nlevels

Number of levels.

shape

Returns a tuple representing the dimensionality of the Index.

size

Return the number of elements in the underlying data.

str

Vectorized string functions for Series and Index.

values

Return an array representing the data in the Index.

values_host

Return a numpy representation of the Index.

Methods

acos()

Get Trigonometric inverse cosine, element-wise.

any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

argsort([ascending])

Return the integer indices that would sort the index.

asin()

Get Trigonometric inverse sine, element-wise.

astype(dtype[, copy])

Create an Index with values cast to dtypes.

atan()

Get Trigonometric inverse tangent, element-wise.

clip([lower, upper, inplace, axis])

Trim values at input threshold(s).

copy([name, deep, dtype, names])

Make a copy of this object.

cos()

Get Trigonometric cosine, element-wise.

difference(other[, sort])

Return a new Index with elements from the index that are not in other.

drop_duplicates([keep])

Return Index with duplicate values removed

dropna([how])

Return an Index with null values removed.

equals(other, **kwargs)

Determine if two Index objects contain the same elements.

exp()

Get the exponential of all elements, element-wise.

factorize([na_sentinel])

Encode the input values as integer labels

fillna(value[, downcast])

Fill null values with the specified value.

find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

from_pandas(index[, nan_as_null])

Convert from a Pandas Index.

get_level_values(level)

Return an Index of values for requested level.

get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label.

interleave_columns()

Interleave Series columns of a table into a single column.

isin(values)

Return a boolean array where the index values are in values.

isna()

Identify missing values.

isnull()

Identify missing values.

join(other[, how, level, return_indexers, sort])

Compute join_index and indexers to conform data structures to the new index.

log()

Get the natural logarithm of all elements, element-wise.

mask(cond[, other, inplace])

Replace values where the condition is True.

max()

Return the maximum value of the Index.

memory_usage([deep])

Memory usage of the values.

min()

Return the minimum value of the Index.

notna()

Identify non-missing values.

notnull()

Identify non-missing values.

pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

rank([axis, method, numeric_only, …])

Compute numerical data ranks (1 through n) along axis.

rename(name[, inplace])

Alter Index name.

repeat(repeats[, axis])

Repeats elements consecutively.

round([decimals])

Round a DataFrame to a variable number of decimal places.

sample([n, frac, replace, weights, …])

Return a random sample of items from an axis of object.

scatter_by_map(map_index[, map_size, keep_index])

Scatter to a list of dataframes.

searchsorted(values[, side, ascending, …])

Find indices where elements should be inserted to maintain order

set_names(names[, level, inplace])

Set Index or MultiIndex name.

shift([periods, freq, axis, fill_value])

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

sort_values([return_indexer, ascending, key])

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

sqrt()

Get the non-negative square-root of all elements, element-wise.

sum()

Return the sum of all values of the Index.

take(indices)

Gather only the specific subset of indices

tan()

Get Trigonometric tangent, element-wise.

tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

to_array([fillna])

Get a dense numpy array for the data.

to_arrow()

Convert Index to PyArrow Array

to_dlpack()

Converts a cuDF object into a DLPack tensor.

to_frame([index, name])

Create a DataFrame with a column containing this Index

to_pandas()

Convert to a Pandas Index.

to_series([index, name])

Create a Series with both index and values equal to the index keys.

unique()

Return unique values in the index.

where(cond[, other])

Replace values where the condition is False.

replace

acos()

Get Trigonometric inverse cosine, element-wise.

The inverse of cos so that, if y = x.cos(), then x = y.acos()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.acos()
0    3.141593
1    1.570796
2    0.000000
3    1.240482
4    1.047198
dtype: float64

acos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.acos()
      first    second
0  3.141593  1.334606
1  1.570796  1.266104
2  1.047198  1.470629

acos operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.acos()
Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0,
            1.5707963267948966,  1.266103672779499],
            dtype='float64')
any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

Parameters
otherIndex or list/tuple of indices
Returns
appendedIndex

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 10, 100])
>>> idx
Int64Index([1, 2, 10, 100], dtype='int64')
>>> other = cudf.Index([200, 400, 50])
>>> other
Int64Index([200, 400, 50], dtype='int64')
>>> idx.append(other)
Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')

append accepts list of Index objects

>>> idx.append([other, other])
Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
argsort(ascending=True, **kwargs)

Return the integer indices that would sort the index.

Parameters
ascendingbool, default True

If True, returns the indices for ascending order. If False, returns the indices for descending order.

Returns
arrayA cupy array containing Integer indices that

would sort the index if used as an indexer.

Examples

>>> import cudf
>>> index = cudf.Index([10, 100, 1, 1000])
>>> index
Int64Index([10, 100, 1, 1000], dtype='int64')
>>> index.argsort()
array([2, 0, 1, 3], dtype=int32)

The order of argsort can be reversed using ascending parameter, by setting it to False. >>> index.argsort(ascending=False) array([3, 1, 0, 2], dtype=int32)

argsort on a MultiIndex:

>>> index = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> index
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> index.argsort()
array([4, 0, 1, 2, 3], dtype=int32)
>>> index.argsort(ascending=False)
array([3, 2, 1, 0, 4], dtype=int32)
asin()

Get Trigonometric inverse sine, element-wise.

The inverse of sine so that, if y = x.sin(), then x = y.asin()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.asin()
0   -1.570796
1    0.000000
2    1.570796
3    0.330314
4    0.523599
dtype: float64

asin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.asin()
      first    second
0 -1.570796  0.236190
1  0.000000  0.304693
2  0.523599  0.100167

asin operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64')
>>> index.asin()
Float64Index([-1.5707963267948966, 0.41151684606748806,
            1.5707963267948966, 0.3046926540153975],
            dtype='float64')
astype(dtype, copy=False)

Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.

Parameters
dtypenumpy dtype

Use a numpy.dtype to cast entire Index object to.

copybool, default False

By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.

Returns
Index

Index with values cast to specified dtype.

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3])
>>> index
Int64Index([1, 2, 3], dtype='int64')
>>> index.astype('float64')
Float64Index([1.0, 2.0, 3.0], dtype='float64')
atan()

Get Trigonometric inverse tangent, element-wise.

The inverse of tan so that, if y = x.tan(), then x = y.atan()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10])
>>> ser
0    -1.00000
1     0.00000
2     1.00000
3     0.32434
4     0.50000
5   -10.00000
dtype: float64
>>> ser.atan()
0   -0.785398
1    0.000000
2    0.785398
3    0.313635
4    0.463648
5   -1.471128
dtype: float64

atan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.atan()
      first    second
0 -0.785398  0.229864
1 -1.471128  0.291457
2  0.463648  1.471128

atan operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.atan()
Float64Index([-0.7853981633974483,  0.3805063771123649,
                            0.7853981633974483, 0.0,
                            0.2914567944778671],
            dtype='float64')
clip(lower=None, upper=None, inplace=False, axis=1)

Trim values at input threshold(s).

Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.

Parameters
lowerscalar or array_like, default None

Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.

upperscalar or array_like, default None

Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.

inplacebool, default False
Returns
Clipped DataFrame/Series/Index/MultiIndex

Examples

>>> import cudf
>>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']})
>>> df.clip(lower=[2, 'b'], upper=[3, 'c'])
   a  b
0  2  b
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=None, upper=[3, 'c'])
   a  b
0  1  a
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=[2, 'b'], upper=None)
   a  b
0  2  b
1  2  b
2  3  c
3  4  d
>>> df.clip(lower=2, upper=3, inplace=True)
>>> df
   a  b
0  2  2
1  2  3
2  3  3
3  3  3
>>> import cudf
>>> sr = cudf.Series([1, 2, 3, 4])
>>> sr.clip(lower=2, upper=3)
0    2
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=None, upper=3)
0    1
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True)
>>> sr
0    2
1    2
2    3
3    4
dtype: int64
copy(name=None, deep=False, dtype=None, names=None)

Make a copy of this object.

Parameters
nameobject, default None

Name of index, use original name when None

deepbool, default True

Make a deep copy of the data. With deep=False the original data is used

dtypenumpy dtype, default None

Target datatype to cast into, use original dtype when None

nameslist-like, default False

Kept compatibility with MultiIndex. Should not be used.

Returns
New index instance, casted to new dtype
cos()

Get Trigonometric cosine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.cos()
0    1.000000
1    0.947861
2    0.877583
3    0.525322
4   -0.448074
5   -0.598460
6   -0.283691
dtype: float64

cos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.cos()
      first    second
0  1.000000  0.862319
1  0.283662 -0.283691
2 -0.839072 -0.839039
3 -0.759688 -0.022097

cos operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.cos()
Float64Index([ 0.9210609940028851,  0.8623188722876839,
            -0.5984600690578581, -0.4480736161291701],
            dtype='float64')
difference(other, sort=None)

Return a new Index with elements from the index that are not in other.

This is the set difference of two Index objects.

Parameters
otherIndex or array-like
sortFalse or None, default None

Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.

  • None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.

  • False : Do not sort the result.

Returns
differenceIndex

Examples

>>> import cudf
>>> idx1 = cudf.Index([2, 1, 3, 4])
>>> idx1
Int64Index([2, 1, 3, 4], dtype='int64')
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx2
Int64Index([3, 4, 5, 6], dtype='int64')
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
drop_duplicates(keep='first')

Return Index with duplicate values removed

Parameters
keep{‘first’, ‘last’, False}, default ‘first’
  • ‘first’Drop duplicates except for the

    first occurrence.

  • ‘last’Drop duplicates except for the

    last occurrence.

  • False : Drop all duplicates.

Returns
deduplicatedIndex

Examples

>>> import cudf
>>> idx = cudf.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
>>> idx
StringIndex(['lama' 'cow' 'lama' 'beetle' 'lama' 'hippo'], dtype='object')
>>> idx.drop_duplicates()
StringIndex(['beetle' 'cow' 'hippo' 'lama'], dtype='object')
dropna(how='any')

Return an Index with null values removed.

Parameters
how{‘any’, ‘all’}, default ‘any’

If the Index is a MultiIndex, drop the value when any or all levels are NaN.

Returns
validIndex

Examples

>>> import cudf
>>> index = cudf.Index(['a', None, 'b', 'c'])
>>> index
StringIndex(['a' None 'b' 'c'], dtype='object')
>>> index.dropna()
StringIndex(['a' 'b' 'c'], dtype='object')

Using dropna on a MultiIndex:

>>> midx = cudf.MultiIndex(
...         levels=[[1, None, 4, None], [1, 2, 5]],
...         codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...         names=["x", "y"],
...     )
>>> midx
MultiIndex([(   1, 1),
            (   1, 5),
            (<NA>, 2),
            (   4, 2),
            (<NA>, 1)],
           names=['x', 'y'])
>>> midx.dropna()
MultiIndex([(1, 1),
            (1, 5),
            (4, 2)],
           names=['x', 'y'])
property dtype

dtype of the underlying values in GenericIndex.

property empty

Indicator whether Index is empty.

True if Index is entirely empty (no items).

Returns
outbool

If Index is empty, return True, if not return False.

Examples

>>> import cudf
>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.empty
True
equals(other, **kwargs)

Determine if two Index objects contain the same elements.

Returns
out: bool

True if “other” is an Index and it has the same elements as calling index; False otherwise.

exp()

Get the exponential of all elements, element-wise.

Exponential is the inverse of the log function, so that x.exp().log() = x

Returns
DataFrame/Series/Index

Result of the element-wise exponential.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.exp()
0    3.678794e-01
1    1.000000e+00
2    2.718282e+00
3    1.383117e+00
4    1.648721e+00
5    4.539993e-05
6    2.688117e+43
dtype: float64

exp operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.exp()
      first        second
0  0.367879      1.263644
1  0.000045      1.349859
2  1.648721  22026.465795

exp operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.exp()
Float64Index([0.36787944117144233,  1.4918246976412703,
              2.718281828459045, 1.0,  1.3498588075760032],
            dtype='float64')
factorize(na_sentinel=- 1)

Encode the input values as integer labels

See also

cudf.core.series.Series.factorize

Encode the input values of Series.

fillna(value, downcast=None)

Fill null values with the specified value.

Parameters
valuescalar

Scalar value to use to fill nulls. This value cannot be a list-likes.

downcastdict, default is None

This Parameter is currently NON-FUNCTIONAL.

Returns
filledIndex

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, None, 4])
>>> index
Int64Index([1, 2, null, 4], dtype='int64')
>>> index.fillna(3)
Int64Index([1, 2, 3, 4], dtype='int64')
find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

Returns
begin, end2-tuple of int

The starting index and the ending index. The last value occurs at end - 1 position.

classmethod from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

Parameters
arrayPyArrow Array/ChunkedArray

PyArrow Object which has to be converted to Index

Returns
cudf Index
Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pyarrow as pa
>>> cudf.Index.from_arrow(pa.array(["a", "b", None]))
StringIndex(['a' 'b' None], dtype='object')
classmethod from_pandas(index, nan_as_null=None)

Convert from a Pandas Index.

Parameters
indexPandas Index object

A Pandas Index object which has to be converted to cuDF Index.

nan_as_nullbool, Default None

If None/True, converts np.nan values to null values. If False, leaves np.nan values as is.

Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pandas as pd
>>> import numpy as np
>>> data = [10, 20, 30, np.nan]
>>> pdi = pd.Index(data)
>>> cudf.core.index.Index.from_pandas(pdi)
Float64Index([10.0, 20.0, 30.0, <NA>], dtype='float64')
>>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False)
Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
get_level_values(level)

Return an Index of values for requested level.

This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.

Parameters
levelint or str

It is either the integer position or the name of the level.

Returns
Index

Calling object, as there is only one level in the Index.

See also

cudf.core.multiindex.MultiIndex.get_level_values

Get values for a level of a MultiIndex.

Notes

For Index, level should be 0, since there are no multiple levels.

Examples

>>> import cudf
>>> idx = cudf.Index(["a", "b", "c"])
>>> idx.get_level_values(0)
StringIndex(['a' 'b' 'c'], dtype='object')
get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if side=='right') position of given label.

Parameters
labelobject
side{‘left’, ‘right’}
kind{‘ix’, ‘loc’, ‘getitem’}
Returns
int

Index of label.

property gpu_values

View the data as a numba device array object

interleave_columns()

Interleave Series columns of a table into a single column.

Converts the column major table cols into a row major column.

Parameters
colsinput Table containing columns to interleave.
Returns
The interleaved columns as a single column

Examples

>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']])
>>> df
0    [A1, A2, A3]
1    [B1, B2, B3]
>>> df.interleave_columns()
0    A1
1    B1
2    A2
3    B2
4    A3
5    B3
property is_monotonic

Alias for is_monotonic_increasing.

property is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

property is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

property is_unique

Return if the index has unique values.

isin(values)

Return a boolean array where the index values are in values.

Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.

Parameters
valuesset, list-like, Index

Sought values.

Returns
is_containedcupy array

CuPy array of boolean values.

Examples

>>> idx = cudf.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')

Check whether each index value in a list of values.

>>> idx.isin([1, 4])
array([ True, False, False])
isna()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
isnull()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
join(other, how='left', level=None, return_indexers=False, sort=False)

Compute join_index and indexers to conform data structures to the new index.

Parameters
otherIndex.
how{‘left’, ‘right’, ‘inner’, ‘outer’}
return_indexersbool, default False
sortbool, default False

Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).

Returns: index

Examples

>>> import cudf
>>> lhs = cudf.DataFrame(
...     {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b']
... ).index
>>> lhs
MultiIndex([(2, 3),
            (3, 4),
            (1, 2)],
           names=['a', 'b'])
>>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index
>>> rhs
Int64Index([1, 4, 3], dtype='int64', name='a')
>>> lhs.join(rhs, how='inner')
MultiIndex([(3, 4),
            (1, 2)],
           names=['a', 'b'])
log()

Get the natural logarithm of all elements, element-wise.

Natural logarithm is the inverse of the exp function, so that x.log().exp() = x

Returns
DataFrame/Series/Index

Result of the element-wise natural logarithm.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.log()
0         NaN
1        -inf
2    0.000000
3   -1.125963
4   -0.693147
5         NaN
6    4.605170
dtype: float64

log operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.log()
      first    second
0       NaN -1.452434
1       NaN -1.203973
2 -0.693147  2.302585

log operation on Index:

>>> index = cudf.Index([10, 11, 500.0])
>>> index
Float64Index([10.0, 11.0, 500.0], dtype='float64')
>>> index.log()
Float64Index([2.302585092994046, 2.3978952727983707,
            6.214608098422191], dtype='float64')
mask(cond, other=None, inplace=False)

Replace values where the condition is True.

Parameters
condbool Series/DataFrame, array-like

Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.

other: scalar, list of scalars, Series/DataFrame

Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.

DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.

Series expects only scalar or series like with same length

inplacebool, default False

Whether to perform the operation in place on the data.

Returns
Same type as caller

Examples

>>> import cudf
>>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]})
>>> df.mask(df % 2 == 0, [-1, -1])
   A  B
0  1  3
1 -1  5
2  5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.mask(ser > 2, 10)
0    10
1    10
2     2
3     1
4     0
dtype: int64
>>> ser.mask(ser > 2)
0    <NA>
1    <NA>
2       2
3       1
4       0
dtype: int64
max()

Return the maximum value of the Index.

Returns
scalar

Maximum value.

See also

cudf.core.index.Index.min

Return the minimum value in an Index.

cudf.core.series.Series.max

Return the maximum value in a Series.

cudf.core.dataframe.DataFrame.max

Return the maximum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.max()
3
memory_usage(deep=False)

Memory usage of the values.

Parameters
deepbool

Introspect the data deeply, interrogate object dtypes for system-level memory consumption.

Returns
bytes used
min()

Return the minimum value of the Index.

Returns
scalar

Minimum value.

See also

cudf.core.index.Index.max

Return the maximum value in an Index.

cudf.core.series.Series.min

Return the minimum value in a Series.

cudf.core.dataframe.DataFrame.min

Return the minimum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.min()
1
property name

Returns the name of the Index.

property names

Returns a tuple containing the name of the Index.

property ndim

Dimension of the data. Apart from MultiIndex ndim is always 1.

property nlevels

Number of levels.

notna()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
notnull()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

Parameters
funcfunction

Function to apply to the Series/DataFrame/Index. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame/Index.

argsiterable, optional

Positional arguments passed into func.

kwargsmapping, optional

A dictionary of keyword arguments passed into func.

Returns
objectthe return type of func.

Examples

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. Instead of writing

>>> func(g(h(df), arg1=a), arg2=b, arg3=c)

You can write

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe(func, arg2=b, arg3=c)
... )

If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose f takes its data as arg2:

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe((func, 'arg2'), arg1=a, arg3=c)
...  )
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)

Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.

Parameters
axis{0 or ‘index’, 1 or ‘columns’}, default 0

Index to direct ranking.

method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’

How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.

numeric_onlybool, optional

For DataFrame objects, rank only numeric columns if set to True.

na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’

How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.

ascendingbool, default True

Whether or not the elements should be ranked in ascending order.

pctbool, default False

Whether or not to display the returned rankings in percentile form.

Returns
same type as caller

Return a Series or DataFrame with data ranks as values.

rename(name, inplace=False)

Alter Index name.

Defaults to returning new index.

Parameters
namelabel

Name(s) to set.

Returns
Index

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3], name='one')
>>> index
Int64Index([1, 2, 3], dtype='int64', name='one')
>>> index.name
'one'
>>> renamed_index = index.rename('two')
>>> renamed_index
Int64Index([1, 2, 3], dtype='int64', name='two')
>>> renamed_index.name
'two'
repeat(repeats, axis=None)

Repeats elements consecutively.

Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.

Parameters
repeatsint, or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.

Returns
Series/DataFrame/Index

A newly created object of same type as caller with repeated elements.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]})
>>> df
   a   b
0  1  10
1  2  20
2  3  30
>>> df.repeat(3)
   a   b
0  1  10
0  1  10
0  1  10
1  2  20
1  2  20
1  2  20
2  3  30
2  3  30
2  3  30

Repeat on Series

>>> s = cudf.Series([0, 2])
>>> s
0    0
1    2
dtype: int64
>>> s.repeat([3, 4])
0    0
0    0
0    0
1    2
1    2
1    2
1    2
dtype: int64
>>> s.repeat(2)
0    0
0    0
1    2
1    2
dtype: int64

Repeat on Index

>>> index = cudf.Index([10, 22, 33, 55])
>>> index
Int64Index([10, 22, 33, 55], dtype='int64')
>>> index.repeat(5)
Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33,
            33, 33, 33, 33, 55, 55, 55, 55, 55],
        dtype='int64')
round(decimals=0)

Round a DataFrame to a variable number of decimal places.

Parameters
decimalsint, dict, Series

Number of decimal places to round each column to. If an int is given, round each column to the same number of places. Otherwise dict and Series round to variable numbers of places. Column names should be in the keys if decimals is a dict-like, or in the index if decimals is a Series. Any columns not included in decimals will be left as is. Elements of decimals which are not columns of the input will be ignored.

Returns
DataFrame

A DataFrame with the affected columns rounded to the specified number of decimal places.

Examples

>>> df = cudf.DataFrame(
        [(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
...     columns=['dogs', 'cats']
... )
>>> df
    dogs  cats
0  0.21  0.32
1  0.01  0.67
2  0.66  0.03
3  0.21  0.18

By providing an integer each column is rounded to the same number of decimal places

>>> df.round(1)
    dogs  cats
0   0.2   0.3
1   0.0   0.7
2   0.7   0.0
3   0.2   0.2

With a dict, the number of places for specific columns can be specified with the column names as key and the number of decimal places as value

>>> df.round({'dogs': 1, 'cats': 0})
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

Using a Series, the number of places for specific columns can be specified with the column names as index and the number of decimal places as value

>>> decimals = cudf.Series([0, 1], index=['cats', 'dogs'])
>>> df.round(decimals)
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters
nint, optional

Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

fracfloat, optional

Fraction of axis items to return. Cannot be used with n.

replacebool, default False

Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”

weightsstr or ndarray-like, optional

Only supported for axis=1/”columns”

random_stateint, numpy RandomState or None, default None

Seed for the random number generator (if int), or None. If None, a random seed will be chosen. if RandomState, seed will be extracted from current state.

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.

Returns
Series or DataFrame or Index

A new object of same type as caller containing n items randomly sampled from the caller object.

Examples

>>> import cudf as cudf
>>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}})
>>> df.sample(3)
   a
1  2
3  4
0  1
>>> sr = cudf.Series([1, 2, 3, 4, 5])
>>> sr.sample(10, replace=True)
1    4
3    1
2    4
0    5
0    1
4    5
4    1
0    2
0    3
3    2
dtype: int64
>>> df = cudf.DataFrame(
... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]})
>>> df.sample(2, axis=1)
   a  c
0  1  3
1  2  4
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)

Scatter to a list of dataframes.

Uses map_index to determine the destination of each row of the original DataFrame.

Parameters
map_indexSeries, str or list-like

Scatter assignment for each row

map_sizeint

Length of output list. Must be >= uniques in map_index

keep_indexbool

Conserve original index values for each row

Returns
A list of cudf.DataFrame objects.
searchsorted(values, side='left', ascending=True, na_position='last')

Find indices where elements should be inserted to maintain order

Parameters
valueFrame (Shape must be consistent with self)

Values to be hypothetically inserted into Self

sidestr {‘left’, ‘right’} optional, default ‘left‘

If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index

ascendingbool optional, default True

Sorted Frame is in ascending order (otherwise descending)

na_positionstr {‘last’, ‘first’} optional, default ‘last‘

Position of null values in sorted order

Returns
1-D cupy array of insertion points

Examples

>>> s = cudf.Series([1, 2, 3])
>>> s.searchsorted(4)
3
>>> s.searchsorted([0, 4])
array([0, 3], dtype=int32)
>>> s.searchsorted([1, 3], side='left')
array([0, 2], dtype=int32)
>>> s.searchsorted([1, 3], side='right')
array([1, 3], dtype=int32)

If the values are not monotonically sorted, wrong locations may be returned:

>>> s = cudf.Series([2, 1, 3])
>>> s.searchsorted(1)
0   # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]})
>>> df
   a   b
0  1  10
1  3  12
2  5  14
3  7  16
>>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6],
... 'b': [10, 11, 13, 15]})
>>> values_df
   a   b
0  0  10
1  2  17
2  5  13
3  6  15
>>> df.searchsorted(values_df, ascending=False)
array([4, 4, 4, 0], dtype=int32)
set_names(names, level=None, inplace=False)

Set Index or MultiIndex name. Able to set new names partially and by level.

Parameters
nameslabel or list of label

Name(s) to set.

levelint, label or list of int or label, optional

If the index is a MultiIndex, level(s) to set (None for all levels). Otherwise level must be None.

inplacebool, default False

Modifies the object directly, instead of creating a new Index or MultiIndex.

Returns
Index

The same type as the caller or None if inplace is True.

See also

cudf.core.index.Index.rename

Able to set new names without level.

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = cudf.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
            ('python', 2019),
            ( 'cobra', 2018),
            ( 'cobra', 2019)],
           )
>>> idx.names
FrozenList([None, None])
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx.names
FrozenList(['kind', 'year'])
>>> idx.set_names('species', level=0, inplace=True)
>>> idx.names
FrozenList(['species', 'year'])
property shape

Returns a tuple representing the dimensionality of the Index.

shift(periods=1, freq=None, axis=0, fill_value=None)

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.sin()
0    0.000000
1    0.318683
2    0.479426
3    0.850904
4    0.893997
5   -0.801153
6    0.958916
dtype: float64

sin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.sin()
      first    second
0  0.000000 -0.506366
1 -0.958924  0.958916
2 -0.544021 -0.544072
3  0.650288 -0.999756

sin operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.sin()
Float64Index([-0.3894183423086505, -0.5063656411097588,
            0.8011526357338306, 0.8939966636005579],
            dtype='float64')
property size

Return the number of elements in the underlying data.

Returns
sizeSize of the DataFrame / Index / Series / MultiIndex

Examples

Size of an empty dataframe is 0.

>>> import cudf
>>> df = cudf.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>> df.size
0
>>> df = cudf.DataFrame(index=[1, 2, 3])
>>> df
Empty DataFrame
Columns: []
Index: [1, 2, 3]
>>> df.size
0

DataFrame with values

>>> df = cudf.DataFrame({'a': [10, 11, 12],
...         'b': ['hello', 'rapids', 'ai']})
>>> df
    a       b
0  10   hello
1  11  rapids
2  12      ai
>>> df.size
6
>>> df.index
RangeIndex(start=0, stop=3)
>>> df.index.size
3

Size of an Index

>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.size
0
>>> index = cudf.Index([1, 2, 3, 10])
>>> index
Int64Index([1, 2, 3, 10], dtype='int64')
>>> index.size
4

Size of a MultiIndex

>>> midx = cudf.MultiIndex(
...                 levels=[["a", "b", "c", None], ["1", None, "5"]],
...                 codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...                 names=["x", "y"],
...             )
>>> midx
MultiIndex([( 'a',  '1'),
            ( 'a',  '5'),
            ( 'b', <NA>),
            ( 'c', <NA>),
            (<NA>,  '1')],
           names=['x', 'y'])
>>> midx.size
5
sort_values(return_indexer=False, ascending=True, key=None)

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

Parameters
return_indexerbool, default False

Should the indices that would sort the index be returned.

ascendingbool, default True

Should the index values be sorted in an ascending order.

keyNone, optional

This parameter is NON-FUNCTIONAL.

Returns
sorted_indexIndex

Sorted copy of the index.

indexercupy.ndarray, optional

The indices that the index itself was sorted by.

See also

cudf.core.series.Series.min

Sort values of a Series.

cudf.core.dataframe.DataFrame.sort_values

Sort values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')

Sort values in ascending order (default behavior).

>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')

Sort values in descending order, and also get the indices idx was sorted by.

>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2],
                                                    dtype=int32))

Sorting values in a MultiIndex:

>>> midx = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> midx
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> midx.sort_values()
MultiIndex([(-10,  1),
            (  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11)],
           names=['x', 'y'])
>>> midx.sort_values(ascending=False)
MultiIndex([(  4, 11),
            (  3, 11),
            (  1,  5),
            (  1,  1),
            (-10,  1)],
           names=['x', 'y'])
sqrt()

Get the non-negative square-root of all elements, element-wise.

Returns
DataFrame/Series/Index

Result of the non-negative square-root of each element.

Examples

>>> import cudf
>>> import cudf
>>> ser = cudf.Series([10, 25, 81, 1.0, 100])
>>> ser
0     10.0
1     25.0
2     81.0
3      1.0
4    100.0
dtype: float64
>>> ser.sqrt()
0     3.162278
1     5.000000
2     9.000000
3     1.000000
4    10.000000
dtype: float64

sqrt operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-10.0, 100, 625],
...                      'second': [1, 2, 0.4]})
>>> df
   first  second
0  -10.0     1.0
1  100.0     2.0
2  625.0     0.4
>>> df.sqrt()
   first    second
0    NaN  1.000000
1   10.0  1.414214
2   25.0  0.632456

sqrt operation on Index:

>>> index = cudf.Index([-10.0, 100, 625])
>>> index
Float64Index([-10.0, 100.0, 625.0], dtype='float64')
>>> index.sqrt()
Float64Index([nan, 10.0, 25.0], dtype='float64')
property str

Vectorized string functions for Series and Index.

This mimics pandas df.str interface. nulls stay null unless handled otherwise by a particular method. Patterned after Python’s string methods, with some inspiration from R’s stringr package.

sum()

Return the sum of all values of the Index.

Returns
scalar

Sum of all values.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.sum()
6
take(indices)

Gather only the specific subset of indices

Parameters
indices: An array-like that maps to values contained in this Index.
tan()

Get Trigonometric tangent, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.tan()
0    0.000000
1    0.336213
2    0.546302
3    1.619775
4   -1.995200
5    1.338690
6   -3.380140
dtype: float64

tan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.tan()
      first     second
0  0.000000  -0.587214
1 -3.380515  -3.380140
2  0.648361   0.648446
3 -0.855993  45.244742

tan operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.tan()
Float64Index([-0.4227932187381618,  -0.587213915156929,
            -1.3386902103511544, -1.995200412208242],
            dtype='float64')
tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

Parameters
selfinput Table containing columns to interleave.
countNumber of times to tile “rows”. Must be non-negative.
Returns
The table containing the tiled “rows”.

Examples

>>> df  = Dataframe([[8, 4, 7], [5, 2, 3]])
>>> count = 2
>>> df.tile(df, count)
   0  1  2
0  8  4  7
1  5  2  3
0  8  4  7
1  5  2  3
to_array(fillna=None)

Get a dense numpy array for the data.

Parameters
fillnastr or None

Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.

Notes

if fillna is None, null values are skipped. Therefore, the output size could be smaller.

to_arrow()

Convert Index to PyArrow Array

Returns
PyArrow Array

Examples

>>> import cudf
>>> ind = cudf.Index(["a", "b", None])
>>> ind.to_arrow()
<pyarrow.lib.StringArray object at 0x7f796b0e7750>
[
  "a",
  "b",
  null
]
to_dlpack()

Converts a cuDF object into a DLPack tensor.

DLPack is an open-source memory tensor structure: dmlc/dlpack.

This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.

Parameters
cudf_objDataFrame, Series, Index, or Column
Returns
pycapsule_objPyCapsule

Output DLPack tensor pointer which is encapsulated in a PyCapsule object.

to_frame(index=True, name=None)

Create a DataFrame with a column containing this Index

Parameters
indexboolean, default True

Set the index of the returned DataFrame as the original Index

namestr, default None

Name to be used for the column

Returns
DataFrame

cudf DataFrame

to_pandas()

Convert to a Pandas Index.

Examples

>>> import cudf
>>> idx = cudf.Index([-3, 10, 15, 20])
>>> idx
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> idx.to_pandas()
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> type(idx.to_pandas())
<class 'pandas.core.indexes.numeric.Int64Index'>
>>> type(idx)
<class 'cudf.core.index.GenericIndex'>
to_series(index=None, name=None)

Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.

Parameters
indexIndex, optional

Index of resulting Series. If None, defaults to original index.

namestr, optional

Dame of resulting Series. If None, defaults to name of original index.

Returns
Series

The dtype will be based on the type of the Index values.

unique()

Return unique values in the index.

Returns
Index without duplicates
property values

Return an array representing the data in the Index.

Returns
arrayA cupy array of data in the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values
array([  1, -10, 100,  20])
>>> type(index.values)
<class 'cupy.core.core.ndarray'>
property values_host

Return a numpy representation of the Index.

Only the values in the Index will be returned.

Returns
outnumpy.ndarray

The values of the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values_host
array([  1, -10, 100,  20])
>>> type(index.values_host)
<class 'numpy.ndarray'>
where(cond, other=None)

Replace values where the condition is False.

Parameters
condbool array-like with the same length as self

Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.

other: scalar, or array-like

Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.

Returns
Same type as caller

Examples

>>> import cudf
>>> index = cudf.Index([4, 3, 2, 1, 0])
>>> index
Int64Index([4, 3, 2, 1, 0], dtype='int64')
>>> index.where(index > 2, 15)
Int64Index([4, 3, 15, 15, 15], dtype='int64')

DatetimeIndex

class cudf.core.index.DatetimeIndex(data=None, freq=None, tz=None, normalize=False, closed=None, ambiguous='raise', dayfirst=False, yearfirst=False, dtype=None, copy=False, name=None)

Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.

Parameters
dataarray-like (1-dimensional)/ DataFrame

If it is a DataFrame, it will return a MultiIndex

dtypeNumPy dtype (default: object)

If dtype is None, we find the dtype that best fits the data.

copybool

Make a copy of input data.

nameobject

Name to be stored in the index.

tupleize_colsbool (default: True)

When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.

Returns
Index

cudf Index

Examples

>>> import cudf
>>> cudf.Index([1, 2, 3], dtype="uint64", name="a")
UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]}))
MultiIndex([(1, 2),
            (2, 3)],
          names=['a', 'b'])
Attributes
day

The day of the datetime.

dayofweek

The day of the week with Monday=0, Sunday=6.

dtype

dtype of the underlying values in GenericIndex.

empty

Indicator whether Index is empty.

gpu_values

View the data as a numba device array object

hour

The hours of the datetime.

is_monotonic

Alias for is_monotonic_increasing.

is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

is_unique

Return if the index has unique values.

minute

The minutes of the datetime.

month

The month as January=1, December=12.

name

Returns the name of the Index.

names

Returns a tuple containing the name of the Index.

ndim

Dimension of the data.

nlevels

Number of levels.

second

The seconds of the datetime.

shape

Returns a tuple representing the dimensionality of the Index.

size

Return the number of elements in the underlying data.

values

Return an array representing the data in the Index.

values_host

Return a numpy representation of the Index.

weekday

The day of the week with Monday=0, Sunday=6.

year

The year of the datetime.

Methods

acos()

Get Trigonometric inverse cosine, element-wise.

any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

argsort([ascending])

Return the integer indices that would sort the index.

asin()

Get Trigonometric inverse sine, element-wise.

astype(dtype[, copy])

Create an Index with values cast to dtypes.

atan()

Get Trigonometric inverse tangent, element-wise.

clip([lower, upper, inplace, axis])

Trim values at input threshold(s).

copy([name, deep, dtype, names])

Make a copy of this object.

cos()

Get Trigonometric cosine, element-wise.

difference(other[, sort])

Return a new Index with elements from the index that are not in other.

drop_duplicates([keep])

Return Index with duplicate values removed

dropna([how])

Return an Index with null values removed.

equals(other, **kwargs)

Determine if two Index objects contain the same elements.

exp()

Get the exponential of all elements, element-wise.

factorize([na_sentinel])

Encode the input values as integer labels

fillna(value[, downcast])

Fill null values with the specified value.

find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

from_pandas(index[, nan_as_null])

Convert from a Pandas Index.

get_level_values(level)

Return an Index of values for requested level.

get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label.

interleave_columns()

Interleave Series columns of a table into a single column.

isin(values)

Return a boolean array where the index values are in values.

isna()

Identify missing values.

isnull()

Identify missing values.

join(other[, how, level, return_indexers, sort])

Compute join_index and indexers to conform data structures to the new index.

log()

Get the natural logarithm of all elements, element-wise.

mask(cond[, other, inplace])

Replace values where the condition is True.

max()

Return the maximum value of the Index.

memory_usage([deep])

Memory usage of the values.

min()

Return the minimum value of the Index.

notna()

Identify non-missing values.

notnull()

Identify non-missing values.

pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

rank([axis, method, numeric_only, …])

Compute numerical data ranks (1 through n) along axis.

rename(name[, inplace])

Alter Index name.

repeat(repeats[, axis])

Repeats elements consecutively.

round([decimals])

Round a DataFrame to a variable number of decimal places.

sample([n, frac, replace, weights, …])

Return a random sample of items from an axis of object.

scatter_by_map(map_index[, map_size, keep_index])

Scatter to a list of dataframes.

searchsorted(values[, side, ascending, …])

Find indices where elements should be inserted to maintain order

set_names(names[, level, inplace])

Set Index or MultiIndex name.

shift([periods, freq, axis, fill_value])

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

sort_values([return_indexer, ascending, key])

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

sqrt()

Get the non-negative square-root of all elements, element-wise.

sum()

Return the sum of all values of the Index.

take(indices)

Gather only the specific subset of indices

tan()

Get Trigonometric tangent, element-wise.

tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

to_array([fillna])

Get a dense numpy array for the data.

to_arrow()

Convert Index to PyArrow Array

to_dlpack()

Converts a cuDF object into a DLPack tensor.

to_frame([index, name])

Create a DataFrame with a column containing this Index

to_pandas()

Convert to a Pandas Index.

to_series([index, name])

Create a Series with both index and values equal to the index keys.

unique()

Return unique values in the index.

where(cond[, other])

Replace values where the condition is False.

replace

acos()

Get Trigonometric inverse cosine, element-wise.

The inverse of cos so that, if y = x.cos(), then x = y.acos()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.acos()
0    3.141593
1    1.570796
2    0.000000
3    1.240482
4    1.047198
dtype: float64

acos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.acos()
      first    second
0  3.141593  1.334606
1  1.570796  1.266104
2  1.047198  1.470629

acos operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.acos()
Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0,
            1.5707963267948966,  1.266103672779499],
            dtype='float64')
any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

Parameters
otherIndex or list/tuple of indices
Returns
appendedIndex

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 10, 100])
>>> idx
Int64Index([1, 2, 10, 100], dtype='int64')
>>> other = cudf.Index([200, 400, 50])
>>> other
Int64Index([200, 400, 50], dtype='int64')
>>> idx.append(other)
Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')

append accepts list of Index objects

>>> idx.append([other, other])
Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
argsort(ascending=True, **kwargs)

Return the integer indices that would sort the index.

Parameters
ascendingbool, default True

If True, returns the indices for ascending order. If False, returns the indices for descending order.

Returns
arrayA cupy array containing Integer indices that

would sort the index if used as an indexer.

Examples

>>> import cudf
>>> index = cudf.Index([10, 100, 1, 1000])
>>> index
Int64Index([10, 100, 1, 1000], dtype='int64')
>>> index.argsort()
array([2, 0, 1, 3], dtype=int32)

The order of argsort can be reversed using ascending parameter, by setting it to False. >>> index.argsort(ascending=False) array([3, 1, 0, 2], dtype=int32)

argsort on a MultiIndex:

>>> index = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> index
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> index.argsort()
array([4, 0, 1, 2, 3], dtype=int32)
>>> index.argsort(ascending=False)
array([3, 2, 1, 0, 4], dtype=int32)
asin()

Get Trigonometric inverse sine, element-wise.

The inverse of sine so that, if y = x.sin(), then x = y.asin()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.asin()
0   -1.570796
1    0.000000
2    1.570796
3    0.330314
4    0.523599
dtype: float64

asin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.asin()
      first    second
0 -1.570796  0.236190
1  0.000000  0.304693
2  0.523599  0.100167

asin operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64')
>>> index.asin()
Float64Index([-1.5707963267948966, 0.41151684606748806,
            1.5707963267948966, 0.3046926540153975],
            dtype='float64')
astype(dtype, copy=False)

Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.

Parameters
dtypenumpy dtype

Use a numpy.dtype to cast entire Index object to.

copybool, default False

By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.

Returns
Index

Index with values cast to specified dtype.

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3])
>>> index
Int64Index([1, 2, 3], dtype='int64')
>>> index.astype('float64')
Float64Index([1.0, 2.0, 3.0], dtype='float64')
atan()

Get Trigonometric inverse tangent, element-wise.

The inverse of tan so that, if y = x.tan(), then x = y.atan()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10])
>>> ser
0    -1.00000
1     0.00000
2     1.00000
3     0.32434
4     0.50000
5   -10.00000
dtype: float64
>>> ser.atan()
0   -0.785398
1    0.000000
2    0.785398
3    0.313635
4    0.463648
5   -1.471128
dtype: float64

atan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.atan()
      first    second
0 -0.785398  0.229864
1 -1.471128  0.291457
2  0.463648  1.471128

atan operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.atan()
Float64Index([-0.7853981633974483,  0.3805063771123649,
                            0.7853981633974483, 0.0,
                            0.2914567944778671],
            dtype='float64')
clip(lower=None, upper=None, inplace=False, axis=1)

Trim values at input threshold(s).

Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.

Parameters
lowerscalar or array_like, default None

Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.

upperscalar or array_like, default None

Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.

inplacebool, default False
Returns
Clipped DataFrame/Series/Index/MultiIndex

Examples

>>> import cudf
>>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']})
>>> df.clip(lower=[2, 'b'], upper=[3, 'c'])
   a  b
0  2  b
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=None, upper=[3, 'c'])
   a  b
0  1  a
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=[2, 'b'], upper=None)
   a  b
0  2  b
1  2  b
2  3  c
3  4  d
>>> df.clip(lower=2, upper=3, inplace=True)
>>> df
   a  b
0  2  2
1  2  3
2  3  3
3  3  3
>>> import cudf
>>> sr = cudf.Series([1, 2, 3, 4])
>>> sr.clip(lower=2, upper=3)
0    2
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=None, upper=3)
0    1
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True)
>>> sr
0    2
1    2
2    3
3    4
dtype: int64
copy(name=None, deep=False, dtype=None, names=None)

Make a copy of this object.

Parameters
nameobject, default None

Name of index, use original name when None

deepbool, default True

Make a deep copy of the data. With deep=False the original data is used

dtypenumpy dtype, default None

Target datatype to cast into, use original dtype when None

nameslist-like, default False

Kept compatibility with MultiIndex. Should not be used.

Returns
New index instance, casted to new dtype
cos()

Get Trigonometric cosine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.cos()
0    1.000000
1    0.947861
2    0.877583
3    0.525322
4   -0.448074
5   -0.598460
6   -0.283691
dtype: float64

cos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.cos()
      first    second
0  1.000000  0.862319
1  0.283662 -0.283691
2 -0.839072 -0.839039
3 -0.759688 -0.022097

cos operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.cos()
Float64Index([ 0.9210609940028851,  0.8623188722876839,
            -0.5984600690578581, -0.4480736161291701],
            dtype='float64')
property day

The day of the datetime.

Examples

>>> import pandas as pd
>>> import cudf
>>> datetime_index = cudf.Index(pd.date_range("2000-01-01",
...             periods=3, freq="D"))
>>> datetime_index
DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03'], dtype='datetime64[ns]')
>>> datetime_index.day
Int16Index([1, 2, 3], dtype='int16')
property dayofweek

The day of the week with Monday=0, Sunday=6.

Examples

>>> import pandas as pd
>>> import cudf
>>> datetime_index = cudf.Index(pd.date_range("2016-12-31",
...     "2017-01-08", freq="D"))
>>> datetime_index
DatetimeIndex(['2016-12-31', '2017-01-01', '2017-01-02', '2017-01-03',
            '2017-01-04', '2017-01-05', '2017-01-06', '2017-01-07',
            '2017-01-08'],
            dtype='datetime64[ns]')
>>> datetime_index.dayofweek
Int16Index([5, 6, 0, 1, 2, 3, 4, 5, 6], dtype='int16')
difference(other, sort=None)

Return a new Index with elements from the index that are not in other.

This is the set difference of two Index objects.

Parameters
otherIndex or array-like
sortFalse or None, default None

Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.

  • None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.

  • False : Do not sort the result.

Returns
differenceIndex

Examples

>>> import cudf
>>> idx1 = cudf.Index([2, 1, 3, 4])
>>> idx1
Int64Index([2, 1, 3, 4], dtype='int64')
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx2
Int64Index([3, 4, 5, 6], dtype='int64')
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
drop_duplicates(keep='first')

Return Index with duplicate values removed

Parameters
keep{‘first’, ‘last’, False}, default ‘first’
  • ‘first’Drop duplicates except for the

    first occurrence.

  • ‘last’Drop duplicates except for the

    last occurrence.

  • False : Drop all duplicates.

Returns
deduplicatedIndex

Examples

>>> import cudf
>>> idx = cudf.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
>>> idx
StringIndex(['lama' 'cow' 'lama' 'beetle' 'lama' 'hippo'], dtype='object')
>>> idx.drop_duplicates()
StringIndex(['beetle' 'cow' 'hippo' 'lama'], dtype='object')
dropna(how='any')

Return an Index with null values removed.

Parameters
how{‘any’, ‘all’}, default ‘any’

If the Index is a MultiIndex, drop the value when any or all levels are NaN.

Returns
validIndex

Examples

>>> import cudf
>>> index = cudf.Index(['a', None, 'b', 'c'])
>>> index
StringIndex(['a' None 'b' 'c'], dtype='object')
>>> index.dropna()
StringIndex(['a' 'b' 'c'], dtype='object')

Using dropna on a MultiIndex:

>>> midx = cudf.MultiIndex(
...         levels=[[1, None, 4, None], [1, 2, 5]],
...         codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...         names=["x", "y"],
...     )
>>> midx
MultiIndex([(   1, 1),
            (   1, 5),
            (<NA>, 2),
            (   4, 2),
            (<NA>, 1)],
           names=['x', 'y'])
>>> midx.dropna()
MultiIndex([(1, 1),
            (1, 5),
            (4, 2)],
           names=['x', 'y'])
property dtype

dtype of the underlying values in GenericIndex.

property empty

Indicator whether Index is empty.

True if Index is entirely empty (no items).

Returns
outbool

If Index is empty, return True, if not return False.

Examples

>>> import cudf
>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.empty
True
equals(other, **kwargs)

Determine if two Index objects contain the same elements.

Returns
out: bool

True if “other” is an Index and it has the same elements as calling index; False otherwise.

exp()

Get the exponential of all elements, element-wise.

Exponential is the inverse of the log function, so that x.exp().log() = x

Returns
DataFrame/Series/Index

Result of the element-wise exponential.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.exp()
0    3.678794e-01
1    1.000000e+00
2    2.718282e+00
3    1.383117e+00
4    1.648721e+00
5    4.539993e-05
6    2.688117e+43
dtype: float64

exp operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.exp()
      first        second
0  0.367879      1.263644
1  0.000045      1.349859
2  1.648721  22026.465795

exp operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.exp()
Float64Index([0.36787944117144233,  1.4918246976412703,
              2.718281828459045, 1.0,  1.3498588075760032],
            dtype='float64')
factorize(na_sentinel=- 1)

Encode the input values as integer labels

See also

cudf.core.series.Series.factorize

Encode the input values of Series.

fillna(value, downcast=None)

Fill null values with the specified value.

Parameters
valuescalar

Scalar value to use to fill nulls. This value cannot be a list-likes.

downcastdict, default is None

This Parameter is currently NON-FUNCTIONAL.

Returns
filledIndex

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, None, 4])
>>> index
Int64Index([1, 2, null, 4], dtype='int64')
>>> index.fillna(3)
Int64Index([1, 2, 3, 4], dtype='int64')
find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

Returns
begin, end2-tuple of int

The starting index and the ending index. The last value occurs at end - 1 position.

classmethod from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

Parameters
arrayPyArrow Array/ChunkedArray

PyArrow Object which has to be converted to Index

Returns
cudf Index
Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pyarrow as pa
>>> cudf.Index.from_arrow(pa.array(["a", "b", None]))
StringIndex(['a' 'b' None], dtype='object')
classmethod from_pandas(index, nan_as_null=None)

Convert from a Pandas Index.

Parameters
indexPandas Index object

A Pandas Index object which has to be converted to cuDF Index.

nan_as_nullbool, Default None

If None/True, converts np.nan values to null values. If False, leaves np.nan values as is.

Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pandas as pd
>>> import numpy as np
>>> data = [10, 20, 30, np.nan]
>>> pdi = pd.Index(data)
>>> cudf.core.index.Index.from_pandas(pdi)
Float64Index([10.0, 20.0, 30.0, <NA>], dtype='float64')
>>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False)
Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
get_level_values(level)

Return an Index of values for requested level.

This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.

Parameters
levelint or str

It is either the integer position or the name of the level.

Returns
Index

Calling object, as there is only one level in the Index.

See also

cudf.core.multiindex.MultiIndex.get_level_values

Get values for a level of a MultiIndex.

Notes

For Index, level should be 0, since there are no multiple levels.

Examples

>>> import cudf
>>> idx = cudf.Index(["a", "b", "c"])
>>> idx.get_level_values(0)
StringIndex(['a' 'b' 'c'], dtype='object')
get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if side=='right') position of given label.

Parameters
labelobject
side{‘left’, ‘right’}
kind{‘ix’, ‘loc’, ‘getitem’}
Returns
int

Index of label.

property gpu_values

View the data as a numba device array object

property hour

The hours of the datetime.

Examples

>>> import pandas as pd
>>> import cudf
>>> datetime_index = cudf.Index(pd.date_range("2000-01-01",
...             periods=3, freq="h"))
>>> datetime_index
DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 01:00:00',
            '2000-01-01 02:00:00'],
            dtype='datetime64[ns]')
>>> datetime_index.hour
Int16Index([0, 1, 2], dtype='int16')
interleave_columns()

Interleave Series columns of a table into a single column.

Converts the column major table cols into a row major column.

Parameters
colsinput Table containing columns to interleave.
Returns
The interleaved columns as a single column

Examples

>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']])
>>> df
0    [A1, A2, A3]
1    [B1, B2, B3]
>>> df.interleave_columns()
0    A1
1    B1
2    A2
3    B2
4    A3
5    B3
property is_monotonic

Alias for is_monotonic_increasing.

property is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

property is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

property is_unique

Return if the index has unique values.

isin(values)

Return a boolean array where the index values are in values.

Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.

Parameters
valuesset, list-like, Index

Sought values.

Returns
is_containedcupy array

CuPy array of boolean values.

Examples

>>> idx = cudf.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')

Check whether each index value in a list of values.

>>> idx.isin([1, 4])
array([ True, False, False])
isna()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
isnull()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
join(other, how='left', level=None, return_indexers=False, sort=False)

Compute join_index and indexers to conform data structures to the new index.

Parameters
otherIndex.
how{‘left’, ‘right’, ‘inner’, ‘outer’}
return_indexersbool, default False
sortbool, default False

Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).

Returns: index

Examples

>>> import cudf
>>> lhs = cudf.DataFrame(
...     {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b']
... ).index
>>> lhs
MultiIndex([(2, 3),
            (3, 4),
            (1, 2)],
           names=['a', 'b'])
>>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index
>>> rhs
Int64Index([1, 4, 3], dtype='int64', name='a')
>>> lhs.join(rhs, how='inner')
MultiIndex([(3, 4),
            (1, 2)],
           names=['a', 'b'])
log()

Get the natural logarithm of all elements, element-wise.

Natural logarithm is the inverse of the exp function, so that x.log().exp() = x

Returns
DataFrame/Series/Index

Result of the element-wise natural logarithm.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.log()
0         NaN
1        -inf
2    0.000000
3   -1.125963
4   -0.693147
5         NaN
6    4.605170
dtype: float64

log operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.log()
      first    second
0       NaN -1.452434
1       NaN -1.203973
2 -0.693147  2.302585

log operation on Index:

>>> index = cudf.Index([10, 11, 500.0])
>>> index
Float64Index([10.0, 11.0, 500.0], dtype='float64')
>>> index.log()
Float64Index([2.302585092994046, 2.3978952727983707,
            6.214608098422191], dtype='float64')
mask(cond, other=None, inplace=False)

Replace values where the condition is True.

Parameters
condbool Series/DataFrame, array-like

Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.

other: scalar, list of scalars, Series/DataFrame

Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.

DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.

Series expects only scalar or series like with same length

inplacebool, default False

Whether to perform the operation in place on the data.

Returns
Same type as caller

Examples

>>> import cudf
>>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]})
>>> df.mask(df % 2 == 0, [-1, -1])
   A  B
0  1  3
1 -1  5
2  5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.mask(ser > 2, 10)
0    10
1    10
2     2
3     1
4     0
dtype: int64
>>> ser.mask(ser > 2)
0    <NA>
1    <NA>
2       2
3       1
4       0
dtype: int64
max()

Return the maximum value of the Index.

Returns
scalar

Maximum value.

See also

cudf.core.index.Index.min

Return the minimum value in an Index.

cudf.core.series.Series.max

Return the maximum value in a Series.

cudf.core.dataframe.DataFrame.max

Return the maximum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.max()
3
memory_usage(deep=False)

Memory usage of the values.

Parameters
deepbool

Introspect the data deeply, interrogate object dtypes for system-level memory consumption.

Returns
bytes used
min()

Return the minimum value of the Index.

Returns
scalar

Minimum value.

See also

cudf.core.index.Index.max

Return the maximum value in an Index.

cudf.core.series.Series.min

Return the minimum value in a Series.

cudf.core.dataframe.DataFrame.min

Return the minimum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.min()
1
property minute

The minutes of the datetime.

Examples

>>> import pandas as pd
>>> import cudf
>>> datetime_index = cudf.Index(pd.date_range("2000-01-01",
...             periods=3, freq="T"))
>>> datetime_index
DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 00:01:00',
            '2000-01-01 00:02:00'],
            dtype='datetime64[ns]')
>>> datetime_index.minute
Int16Index([0, 1, 2], dtype='int16')
property month

The month as January=1, December=12.

Examples

>>> import cudf
>>> import pandas as pd
>>> datetime_index = cudf.Index(pd.date_range("2000-01-01",
...             periods=3, freq="M"))
>>> datetime_index
DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31'], dtype='datetime64[ns]')
>>> datetime_index.month
Int16Index([1, 2, 3], dtype='int16')
property name

Returns the name of the Index.

property names

Returns a tuple containing the name of the Index.

property ndim

Dimension of the data. Apart from MultiIndex ndim is always 1.

property nlevels

Number of levels.

notna()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
notnull()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

Parameters
funcfunction

Function to apply to the Series/DataFrame/Index. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame/Index.

argsiterable, optional

Positional arguments passed into func.

kwargsmapping, optional

A dictionary of keyword arguments passed into func.

Returns
objectthe return type of func.

Examples

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. Instead of writing

>>> func(g(h(df), arg1=a), arg2=b, arg3=c)

You can write

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe(func, arg2=b, arg3=c)
... )

If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose f takes its data as arg2:

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe((func, 'arg2'), arg1=a, arg3=c)
...  )
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)

Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.

Parameters
axis{0 or ‘index’, 1 or ‘columns’}, default 0

Index to direct ranking.

method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’

How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.

numeric_onlybool, optional

For DataFrame objects, rank only numeric columns if set to True.

na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’

How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.

ascendingbool, default True

Whether or not the elements should be ranked in ascending order.

pctbool, default False

Whether or not to display the returned rankings in percentile form.

Returns
same type as caller

Return a Series or DataFrame with data ranks as values.

rename(name, inplace=False)

Alter Index name.

Defaults to returning new index.

Parameters
namelabel

Name(s) to set.

Returns
Index

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3], name='one')
>>> index
Int64Index([1, 2, 3], dtype='int64', name='one')
>>> index.name
'one'
>>> renamed_index = index.rename('two')
>>> renamed_index
Int64Index([1, 2, 3], dtype='int64', name='two')
>>> renamed_index.name
'two'
repeat(repeats, axis=None)

Repeats elements consecutively.

Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.

Parameters
repeatsint, or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.

Returns
Series/DataFrame/Index

A newly created object of same type as caller with repeated elements.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]})
>>> df
   a   b
0  1  10
1  2  20
2  3  30
>>> df.repeat(3)
   a   b
0  1  10
0  1  10
0  1  10
1  2  20
1  2  20
1  2  20
2  3  30
2  3  30
2  3  30

Repeat on Series

>>> s = cudf.Series([0, 2])
>>> s
0    0
1    2
dtype: int64
>>> s.repeat([3, 4])
0    0
0    0
0    0
1    2
1    2
1    2
1    2
dtype: int64
>>> s.repeat(2)
0    0
0    0
1    2
1    2
dtype: int64

Repeat on Index

>>> index = cudf.Index([10, 22, 33, 55])
>>> index
Int64Index([10, 22, 33, 55], dtype='int64')
>>> index.repeat(5)
Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33,
            33, 33, 33, 33, 55, 55, 55, 55, 55],
        dtype='int64')
round(decimals=0)

Round a DataFrame to a variable number of decimal places.

Parameters
decimalsint, dict, Series

Number of decimal places to round each column to. If an int is given, round each column to the same number of places. Otherwise dict and Series round to variable numbers of places. Column names should be in the keys if decimals is a dict-like, or in the index if decimals is a Series. Any columns not included in decimals will be left as is. Elements of decimals which are not columns of the input will be ignored.

Returns
DataFrame

A DataFrame with the affected columns rounded to the specified number of decimal places.

Examples

>>> df = cudf.DataFrame(
        [(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
...     columns=['dogs', 'cats']
... )
>>> df
    dogs  cats
0  0.21  0.32
1  0.01  0.67
2  0.66  0.03
3  0.21  0.18

By providing an integer each column is rounded to the same number of decimal places

>>> df.round(1)
    dogs  cats
0   0.2   0.3
1   0.0   0.7
2   0.7   0.0
3   0.2   0.2

With a dict, the number of places for specific columns can be specified with the column names as key and the number of decimal places as value

>>> df.round({'dogs': 1, 'cats': 0})
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

Using a Series, the number of places for specific columns can be specified with the column names as index and the number of decimal places as value

>>> decimals = cudf.Series([0, 1], index=['cats', 'dogs'])
>>> df.round(decimals)
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters
nint, optional

Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

fracfloat, optional

Fraction of axis items to return. Cannot be used with n.

replacebool, default False

Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”

weightsstr or ndarray-like, optional

Only supported for axis=1/”columns”

random_stateint, numpy RandomState or None, default None

Seed for the random number generator (if int), or None. If None, a random seed will be chosen. if RandomState, seed will be extracted from current state.

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.

Returns
Series or DataFrame or Index

A new object of same type as caller containing n items randomly sampled from the caller object.

Examples

>>> import cudf as cudf
>>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}})
>>> df.sample(3)
   a
1  2
3  4
0  1
>>> sr = cudf.Series([1, 2, 3, 4, 5])
>>> sr.sample(10, replace=True)
1    4
3    1
2    4
0    5
0    1
4    5
4    1
0    2
0    3
3    2
dtype: int64
>>> df = cudf.DataFrame(
... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]})
>>> df.sample(2, axis=1)
   a  c
0  1  3
1  2  4
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)

Scatter to a list of dataframes.

Uses map_index to determine the destination of each row of the original DataFrame.

Parameters
map_indexSeries, str or list-like

Scatter assignment for each row

map_sizeint

Length of output list. Must be >= uniques in map_index

keep_indexbool

Conserve original index values for each row

Returns
A list of cudf.DataFrame objects.
searchsorted(values, side='left', ascending=True, na_position='last')

Find indices where elements should be inserted to maintain order

Parameters
valueFrame (Shape must be consistent with self)

Values to be hypothetically inserted into Self

sidestr {‘left’, ‘right’} optional, default ‘left‘

If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index

ascendingbool optional, default True

Sorted Frame is in ascending order (otherwise descending)

na_positionstr {‘last’, ‘first’} optional, default ‘last‘

Position of null values in sorted order

Returns
1-D cupy array of insertion points

Examples

>>> s = cudf.Series([1, 2, 3])
>>> s.searchsorted(4)
3
>>> s.searchsorted([0, 4])
array([0, 3], dtype=int32)
>>> s.searchsorted([1, 3], side='left')
array([0, 2], dtype=int32)
>>> s.searchsorted([1, 3], side='right')
array([1, 3], dtype=int32)

If the values are not monotonically sorted, wrong locations may be returned:

>>> s = cudf.Series([2, 1, 3])
>>> s.searchsorted(1)
0   # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]})
>>> df
   a   b
0  1  10
1  3  12
2  5  14
3  7  16
>>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6],
... 'b': [10, 11, 13, 15]})
>>> values_df
   a   b
0  0  10
1  2  17
2  5  13
3  6  15
>>> df.searchsorted(values_df, ascending=False)
array([4, 4, 4, 0], dtype=int32)
property second

The seconds of the datetime.

Examples

>>> import pandas as pd
>>> import cudf
>>> datetime_index = cudf.Index(pd.date_range("2000-01-01",
...             periods=3, freq="s"))
>>> datetime_index
DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 00:00:01',
            '2000-01-01 00:00:02'],
            dtype='datetime64[ns]')
>>> datetime_index.second
Int16Index([0, 1, 2], dtype='int16')
set_names(names, level=None, inplace=False)

Set Index or MultiIndex name. Able to set new names partially and by level.

Parameters
nameslabel or list of label

Name(s) to set.

levelint, label or list of int or label, optional

If the index is a MultiIndex, level(s) to set (None for all levels). Otherwise level must be None.

inplacebool, default False

Modifies the object directly, instead of creating a new Index or MultiIndex.

Returns
Index

The same type as the caller or None if inplace is True.

See also

cudf.core.index.Index.rename

Able to set new names without level.

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = cudf.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
            ('python', 2019),
            ( 'cobra', 2018),
            ( 'cobra', 2019)],
           )
>>> idx.names
FrozenList([None, None])
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx.names
FrozenList(['kind', 'year'])
>>> idx.set_names('species', level=0, inplace=True)
>>> idx.names
FrozenList(['species', 'year'])
property shape

Returns a tuple representing the dimensionality of the Index.

shift(periods=1, freq=None, axis=0, fill_value=None)

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.sin()
0    0.000000
1    0.318683
2    0.479426
3    0.850904
4    0.893997
5   -0.801153
6    0.958916
dtype: float64

sin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.sin()
      first    second
0  0.000000 -0.506366
1 -0.958924  0.958916
2 -0.544021 -0.544072
3  0.650288 -0.999756

sin operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.sin()
Float64Index([-0.3894183423086505, -0.5063656411097588,
            0.8011526357338306, 0.8939966636005579],
            dtype='float64')
property size

Return the number of elements in the underlying data.

Returns
sizeSize of the DataFrame / Index / Series / MultiIndex

Examples

Size of an empty dataframe is 0.

>>> import cudf
>>> df = cudf.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>> df.size
0
>>> df = cudf.DataFrame(index=[1, 2, 3])
>>> df
Empty DataFrame
Columns: []
Index: [1, 2, 3]
>>> df.size
0

DataFrame with values

>>> df = cudf.DataFrame({'a': [10, 11, 12],
...         'b': ['hello', 'rapids', 'ai']})
>>> df
    a       b
0  10   hello
1  11  rapids
2  12      ai
>>> df.size
6
>>> df.index
RangeIndex(start=0, stop=3)
>>> df.index.size
3

Size of an Index

>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.size
0
>>> index = cudf.Index([1, 2, 3, 10])
>>> index
Int64Index([1, 2, 3, 10], dtype='int64')
>>> index.size
4

Size of a MultiIndex

>>> midx = cudf.MultiIndex(
...                 levels=[["a", "b", "c", None], ["1", None, "5"]],
...                 codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...                 names=["x", "y"],
...             )
>>> midx
MultiIndex([( 'a',  '1'),
            ( 'a',  '5'),
            ( 'b', <NA>),
            ( 'c', <NA>),
            (<NA>,  '1')],
           names=['x', 'y'])
>>> midx.size
5
sort_values(return_indexer=False, ascending=True, key=None)

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

Parameters
return_indexerbool, default False

Should the indices that would sort the index be returned.

ascendingbool, default True

Should the index values be sorted in an ascending order.

keyNone, optional

This parameter is NON-FUNCTIONAL.

Returns
sorted_indexIndex

Sorted copy of the index.

indexercupy.ndarray, optional

The indices that the index itself was sorted by.

See also

cudf.core.series.Series.min

Sort values of a Series.

cudf.core.dataframe.DataFrame.sort_values

Sort values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')

Sort values in ascending order (default behavior).

>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')

Sort values in descending order, and also get the indices idx was sorted by.

>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2],
                                                    dtype=int32))

Sorting values in a MultiIndex:

>>> midx = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> midx
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> midx.sort_values()
MultiIndex([(-10,  1),
            (  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11)],
           names=['x', 'y'])
>>> midx.sort_values(ascending=False)
MultiIndex([(  4, 11),
            (  3, 11),
            (  1,  5),
            (  1,  1),
            (-10,  1)],
           names=['x', 'y'])
sqrt()

Get the non-negative square-root of all elements, element-wise.

Returns
DataFrame/Series/Index

Result of the non-negative square-root of each element.

Examples

>>> import cudf
>>> import cudf
>>> ser = cudf.Series([10, 25, 81, 1.0, 100])
>>> ser
0     10.0
1     25.0
2     81.0
3      1.0
4    100.0
dtype: float64
>>> ser.sqrt()
0     3.162278
1     5.000000
2     9.000000
3     1.000000
4    10.000000
dtype: float64

sqrt operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-10.0, 100, 625],
...                      'second': [1, 2, 0.4]})
>>> df
   first  second
0  -10.0     1.0
1  100.0     2.0
2  625.0     0.4
>>> df.sqrt()
   first    second
0    NaN  1.000000
1   10.0  1.414214
2   25.0  0.632456

sqrt operation on Index:

>>> index = cudf.Index([-10.0, 100, 625])
>>> index
Float64Index([-10.0, 100.0, 625.0], dtype='float64')
>>> index.sqrt()
Float64Index([nan, 10.0, 25.0], dtype='float64')
sum()

Return the sum of all values of the Index.

Returns
scalar

Sum of all values.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.sum()
6
take(indices)

Gather only the specific subset of indices

Parameters
indices: An array-like that maps to values contained in this Index.
tan()

Get Trigonometric tangent, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.tan()
0    0.000000
1    0.336213
2    0.546302
3    1.619775
4   -1.995200
5    1.338690
6   -3.380140
dtype: float64

tan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.tan()
      first     second
0  0.000000  -0.587214
1 -3.380515  -3.380140
2  0.648361   0.648446
3 -0.855993  45.244742

tan operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.tan()
Float64Index([-0.4227932187381618,  -0.587213915156929,
            -1.3386902103511544, -1.995200412208242],
            dtype='float64')
tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

Parameters
selfinput Table containing columns to interleave.
countNumber of times to tile “rows”. Must be non-negative.
Returns
The table containing the tiled “rows”.

Examples

>>> df  = Dataframe([[8, 4, 7], [5, 2, 3]])
>>> count = 2
>>> df.tile(df, count)
   0  1  2
0  8  4  7
1  5  2  3
0  8  4  7
1  5  2  3
to_array(fillna=None)

Get a dense numpy array for the data.

Parameters
fillnastr or None

Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.

Notes

if fillna is None, null values are skipped. Therefore, the output size could be smaller.

to_arrow()

Convert Index to PyArrow Array

Returns
PyArrow Array

Examples

>>> import cudf
>>> ind = cudf.Index(["a", "b", None])
>>> ind.to_arrow()
<pyarrow.lib.StringArray object at 0x7f796b0e7750>
[
  "a",
  "b",
  null
]
to_dlpack()

Converts a cuDF object into a DLPack tensor.

DLPack is an open-source memory tensor structure: dmlc/dlpack.

This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.

Parameters
cudf_objDataFrame, Series, Index, or Column
Returns
pycapsule_objPyCapsule

Output DLPack tensor pointer which is encapsulated in a PyCapsule object.

to_frame(index=True, name=None)

Create a DataFrame with a column containing this Index

Parameters
indexboolean, default True

Set the index of the returned DataFrame as the original Index

namestr, default None

Name to be used for the column

Returns
DataFrame

cudf DataFrame

to_pandas()

Convert to a Pandas Index.

Examples

>>> import cudf
>>> idx = cudf.Index([-3, 10, 15, 20])
>>> idx
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> idx.to_pandas()
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> type(idx.to_pandas())
<class 'pandas.core.indexes.numeric.Int64Index'>
>>> type(idx)
<class 'cudf.core.index.GenericIndex'>
to_series(index=None, name=None)

Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.

Parameters
indexIndex, optional

Index of resulting Series. If None, defaults to original index.

namestr, optional

Dame of resulting Series. If None, defaults to name of original index.

Returns
Series

The dtype will be based on the type of the Index values.

unique()

Return unique values in the index.

Returns
Index without duplicates
property values

Return an array representing the data in the Index.

Returns
arrayA cupy array of data in the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values
array([  1, -10, 100,  20])
>>> type(index.values)
<class 'cupy.core.core.ndarray'>
property values_host

Return a numpy representation of the Index.

Only the values in the Index will be returned.

Returns
outnumpy.ndarray

The values of the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values_host
array([  1, -10, 100,  20])
>>> type(index.values_host)
<class 'numpy.ndarray'>
property weekday

The day of the week with Monday=0, Sunday=6.

Examples

>>> import pandas as pd
>>> import cudf
>>> datetime_index = cudf.Index(pd.date_range("2016-12-31",
...     "2017-01-08", freq="D"))
>>> datetime_index
DatetimeIndex(['2016-12-31', '2017-01-01', '2017-01-02', '2017-01-03',
            '2017-01-04', '2017-01-05', '2017-01-06', '2017-01-07',
            '2017-01-08'],
            dtype='datetime64[ns]')
>>> datetime_index.weekday
Int16Index([5, 6, 0, 1, 2, 3, 4, 5, 6], dtype='int16')
where(cond, other=None)

Replace values where the condition is False.

Parameters
condbool array-like with the same length as self

Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.

other: scalar, or array-like

Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.

Returns
Same type as caller

Examples

>>> import cudf
>>> index = cudf.Index([4, 3, 2, 1, 0])
>>> index
Int64Index([4, 3, 2, 1, 0], dtype='int64')
>>> index.where(index > 2, 15)
Int64Index([4, 3, 15, 15, 15], dtype='int64')
property year

The year of the datetime.

Examples

>>> import cudf
>>> import pandas as pd
>>> datetime_index = cudf.Index(pd.date_range("2000-01-01",
...             periods=3, freq="Y"))
>>> datetime_index
DatetimeIndex(['2000-12-31', '2001-12-31', '2002-12-31'], dtype='datetime64[ns]')
>>> datetime_index.year
Int16Index([2000, 2001, 2002], dtype='int16')

TimedeltaIndex

class cudf.core.index.TimedeltaIndex(data=None, unit=None, freq=None, closed=None, dtype='timedelta64[ns]', copy=False, name=None)

Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.

Parameters
dataarray-like (1-dimensional)/ DataFrame

If it is a DataFrame, it will return a MultiIndex

dtypeNumPy dtype (default: object)

If dtype is None, we find the dtype that best fits the data.

copybool

Make a copy of input data.

nameobject

Name to be stored in the index.

tupleize_colsbool (default: True)

When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.

Returns
Index

cudf Index

Examples

>>> import cudf
>>> cudf.Index([1, 2, 3], dtype="uint64", name="a")
UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]}))
MultiIndex([(1, 2),
            (2, 3)],
          names=['a', 'b'])
Attributes
components

Return a dataframe of the components (days, hours, minutes, seconds, milliseconds, microseconds, nanoseconds) of the Timedeltas.

days

Number of days for each element.

dtype

dtype of the underlying values in GenericIndex.

empty

Indicator whether Index is empty.

gpu_values

View the data as a numba device array object

inferred_freq
is_monotonic

Alias for is_monotonic_increasing.

is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

is_unique

Return if the index has unique values.

microseconds

Number of microseconds (>= 0 and less than 1 second) for each element.

name

Returns the name of the Index.

names

Returns a tuple containing the name of the Index.

nanoseconds

Number of nanoseconds (>= 0 and less than 1 microsecond) for each element.

ndim

Dimension of the data.

nlevels

Number of levels.

seconds

Number of seconds (>= 0 and less than 1 day) for each element.

shape

Returns a tuple representing the dimensionality of the Index.

size

Return the number of elements in the underlying data.

values

Return an array representing the data in the Index.

values_host

Return a numpy representation of the Index.

Methods

acos()

Get Trigonometric inverse cosine, element-wise.

any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

argsort([ascending])

Return the integer indices that would sort the index.

asin()

Get Trigonometric inverse sine, element-wise.

astype(dtype[, copy])

Create an Index with values cast to dtypes.

atan()

Get Trigonometric inverse tangent, element-wise.

clip([lower, upper, inplace, axis])

Trim values at input threshold(s).

copy([name, deep, dtype, names])

Make a copy of this object.

cos()

Get Trigonometric cosine, element-wise.

difference(other[, sort])

Return a new Index with elements from the index that are not in other.

drop_duplicates([keep])

Return Index with duplicate values removed

dropna([how])

Return an Index with null values removed.

equals(other, **kwargs)

Determine if two Index objects contain the same elements.

exp()

Get the exponential of all elements, element-wise.

factorize([na_sentinel])

Encode the input values as integer labels

fillna(value[, downcast])

Fill null values with the specified value.

find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

from_pandas(index[, nan_as_null])

Convert from a Pandas Index.

get_level_values(level)

Return an Index of values for requested level.

get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label.

interleave_columns()

Interleave Series columns of a table into a single column.

isin(values)

Return a boolean array where the index values are in values.

isna()

Identify missing values.

isnull()

Identify missing values.

join(other[, how, level, return_indexers, sort])

Compute join_index and indexers to conform data structures to the new index.

log()

Get the natural logarithm of all elements, element-wise.

mask(cond[, other, inplace])

Replace values where the condition is True.

max()

Return the maximum value of the Index.

memory_usage([deep])

Memory usage of the values.

min()

Return the minimum value of the Index.

notna()

Identify non-missing values.

notnull()

Identify non-missing values.

pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

rank([axis, method, numeric_only, …])

Compute numerical data ranks (1 through n) along axis.

rename(name[, inplace])

Alter Index name.

repeat(repeats[, axis])

Repeats elements consecutively.

round([decimals])

Round a DataFrame to a variable number of decimal places.

sample([n, frac, replace, weights, …])

Return a random sample of items from an axis of object.

scatter_by_map(map_index[, map_size, keep_index])

Scatter to a list of dataframes.

searchsorted(values[, side, ascending, …])

Find indices where elements should be inserted to maintain order

set_names(names[, level, inplace])

Set Index or MultiIndex name.

shift([periods, freq, axis, fill_value])

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

sort_values([return_indexer, ascending, key])

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

sqrt()

Get the non-negative square-root of all elements, element-wise.

sum()

Return the sum of all values of the Index.

take(indices)

Gather only the specific subset of indices

tan()

Get Trigonometric tangent, element-wise.

tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

to_array([fillna])

Get a dense numpy array for the data.

to_arrow()

Convert Index to PyArrow Array

to_dlpack()

Converts a cuDF object into a DLPack tensor.

to_frame([index, name])

Create a DataFrame with a column containing this Index

to_pandas()

Convert to a Pandas Index.

to_series([index, name])

Create a Series with both index and values equal to the index keys.

unique()

Return unique values in the index.

where(cond[, other])

Replace values where the condition is False.

replace

acos()

Get Trigonometric inverse cosine, element-wise.

The inverse of cos so that, if y = x.cos(), then x = y.acos()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.acos()
0    3.141593
1    1.570796
2    0.000000
3    1.240482
4    1.047198
dtype: float64

acos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.acos()
      first    second
0  3.141593  1.334606
1  1.570796  1.266104
2  1.047198  1.470629

acos operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.acos()
Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0,
            1.5707963267948966,  1.266103672779499],
            dtype='float64')
any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index options together.

Parameters
otherIndex or list/tuple of indices
Returns
appendedIndex

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 10, 100])
>>> idx
Int64Index([1, 2, 10, 100], dtype='int64')
>>> other = cudf.Index([200, 400, 50])
>>> other
Int64Index([200, 400, 50], dtype='int64')
>>> idx.append(other)
Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')

append accepts list of Index objects

>>> idx.append([other, other])
Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
argsort(ascending=True, **kwargs)

Return the integer indices that would sort the index.

Parameters
ascendingbool, default True

If True, returns the indices for ascending order. If False, returns the indices for descending order.

Returns
arrayA cupy array containing Integer indices that

would sort the index if used as an indexer.

Examples

>>> import cudf
>>> index = cudf.Index([10, 100, 1, 1000])
>>> index
Int64Index([10, 100, 1, 1000], dtype='int64')
>>> index.argsort()
array([2, 0, 1, 3], dtype=int32)

The order of argsort can be reversed using ascending parameter, by setting it to False. >>> index.argsort(ascending=False) array([3, 1, 0, 2], dtype=int32)

argsort on a MultiIndex:

>>> index = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> index
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> index.argsort()
array([4, 0, 1, 2, 3], dtype=int32)
>>> index.argsort(ascending=False)
array([3, 2, 1, 0, 4], dtype=int32)
asin()

Get Trigonometric inverse sine, element-wise.

The inverse of sine so that, if y = x.sin(), then x = y.asin()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5])
>>> ser.asin()
0   -1.570796
1    0.000000
2    1.570796
3    0.330314
4    0.523599
dtype: float64

asin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, 0, 0.5],
...                      'second': [0.234, 0.3, 0.1]})
>>> df
   first  second
0   -1.0   0.234
1    0.0   0.300
2    0.5   0.100
>>> df.asin()
      first    second
0 -1.570796  0.236190
1  0.000000  0.304693
2  0.523599  0.100167

asin operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64')
>>> index.asin()
Float64Index([-1.5707963267948966, 0.41151684606748806,
            1.5707963267948966, 0.3046926540153975],
            dtype='float64')
astype(dtype, copy=False)

Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.

Parameters
dtypenumpy dtype

Use a numpy.dtype to cast entire Index object to.

copybool, default False

By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.

Returns
Index

Index with values cast to specified dtype.

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3])
>>> index
Int64Index([1, 2, 3], dtype='int64')
>>> index.astype('float64')
Float64Index([1.0, 2.0, 3.0], dtype='float64')
atan()

Get Trigonometric inverse tangent, element-wise.

The inverse of tan so that, if y = x.tan(), then x = y.atan()

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10])
>>> ser
0    -1.00000
1     0.00000
2     1.00000
3     0.32434
4     0.50000
5   -10.00000
dtype: float64
>>> ser.atan()
0   -0.785398
1    0.000000
2    0.785398
3    0.313635
4    0.463648
5   -1.471128
dtype: float64

atan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.atan()
      first    second
0 -0.785398  0.229864
1 -1.471128  0.291457
2  0.463648  1.471128

atan operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.atan()
Float64Index([-0.7853981633974483,  0.3805063771123649,
                            0.7853981633974483, 0.0,
                            0.2914567944778671],
            dtype='float64')
clip(lower=None, upper=None, inplace=False, axis=1)

Trim values at input threshold(s).

Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.

Parameters
lowerscalar or array_like, default None

Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.

upperscalar or array_like, default None

Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.

inplacebool, default False
Returns
Clipped DataFrame/Series/Index/MultiIndex

Examples

>>> import cudf
>>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']})
>>> df.clip(lower=[2, 'b'], upper=[3, 'c'])
   a  b
0  2  b
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=None, upper=[3, 'c'])
   a  b
0  1  a
1  2  b
2  3  c
3  3  c
>>> df.clip(lower=[2, 'b'], upper=None)
   a  b
0  2  b
1  2  b
2  3  c
3  4  d
>>> df.clip(lower=2, upper=3, inplace=True)
>>> df
   a  b
0  2  2
1  2  3
2  3  3
3  3  3
>>> import cudf
>>> sr = cudf.Series([1, 2, 3, 4])
>>> sr.clip(lower=2, upper=3)
0    2
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=None, upper=3)
0    1
1    2
2    3
3    3
dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True)
>>> sr
0    2
1    2
2    3
3    4
dtype: int64
property components

Return a dataframe of the components (days, hours, minutes, seconds, milliseconds, microseconds, nanoseconds) of the Timedeltas.

copy(name=None, deep=False, dtype=None, names=None)

Make a copy of this object.

Parameters
nameobject, default None

Name of index, use original name when None

deepbool, default True

Make a deep copy of the data. With deep=False the original data is used

dtypenumpy dtype, default None

Target datatype to cast into, use original dtype when None

nameslist-like, default False

Kept compatibility with MultiIndex. Should not be used.

Returns
New index instance, casted to new dtype
cos()

Get Trigonometric cosine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.cos()
0    1.000000
1    0.947861
2    0.877583
3    0.525322
4   -0.448074
5   -0.598460
6   -0.283691
dtype: float64

cos operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.cos()
      first    second
0  1.000000  0.862319
1  0.283662 -0.283691
2 -0.839072 -0.839039
3 -0.759688 -0.022097

cos operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.cos()
Float64Index([ 0.9210609940028851,  0.8623188722876839,
            -0.5984600690578581, -0.4480736161291701],
            dtype='float64')
property days

Number of days for each element.

difference(other, sort=None)

Return a new Index with elements from the index that are not in other.

This is the set difference of two Index objects.

Parameters
otherIndex or array-like
sortFalse or None, default None

Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.

  • None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.

  • False : Do not sort the result.

Returns
differenceIndex

Examples

>>> import cudf
>>> idx1 = cudf.Index([2, 1, 3, 4])
>>> idx1
Int64Index([2, 1, 3, 4], dtype='int64')
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx2
Int64Index([3, 4, 5, 6], dtype='int64')
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
drop_duplicates(keep='first')

Return Index with duplicate values removed

Parameters
keep{‘first’, ‘last’, False}, default ‘first’
  • ‘first’Drop duplicates except for the

    first occurrence.

  • ‘last’Drop duplicates except for the

    last occurrence.

  • False : Drop all duplicates.

Returns
deduplicatedIndex

Examples

>>> import cudf
>>> idx = cudf.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
>>> idx
StringIndex(['lama' 'cow' 'lama' 'beetle' 'lama' 'hippo'], dtype='object')
>>> idx.drop_duplicates()
StringIndex(['beetle' 'cow' 'hippo' 'lama'], dtype='object')
dropna(how='any')

Return an Index with null values removed.

Parameters
how{‘any’, ‘all’}, default ‘any’

If the Index is a MultiIndex, drop the value when any or all levels are NaN.

Returns
validIndex

Examples

>>> import cudf
>>> index = cudf.Index(['a', None, 'b', 'c'])
>>> index
StringIndex(['a' None 'b' 'c'], dtype='object')
>>> index.dropna()
StringIndex(['a' 'b' 'c'], dtype='object')

Using dropna on a MultiIndex:

>>> midx = cudf.MultiIndex(
...         levels=[[1, None, 4, None], [1, 2, 5]],
...         codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...         names=["x", "y"],
...     )
>>> midx
MultiIndex([(   1, 1),
            (   1, 5),
            (<NA>, 2),
            (   4, 2),
            (<NA>, 1)],
           names=['x', 'y'])
>>> midx.dropna()
MultiIndex([(1, 1),
            (1, 5),
            (4, 2)],
           names=['x', 'y'])
property dtype

dtype of the underlying values in GenericIndex.

property empty

Indicator whether Index is empty.

True if Index is entirely empty (no items).

Returns
outbool

If Index is empty, return True, if not return False.

Examples

>>> import cudf
>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.empty
True
equals(other, **kwargs)

Determine if two Index objects contain the same elements.

Returns
out: bool

True if “other” is an Index and it has the same elements as calling index; False otherwise.

exp()

Get the exponential of all elements, element-wise.

Exponential is the inverse of the log function, so that x.exp().log() = x

Returns
DataFrame/Series/Index

Result of the element-wise exponential.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.exp()
0    3.678794e-01
1    1.000000e+00
2    2.718282e+00
3    1.383117e+00
4    1.648721e+00
5    4.539993e-05
6    2.688117e+43
dtype: float64

exp operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.exp()
      first        second
0  0.367879      1.263644
1  0.000045      1.349859
2  1.648721  22026.465795

exp operation on Index:

>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3])
>>> index
Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64')
>>> index.exp()
Float64Index([0.36787944117144233,  1.4918246976412703,
              2.718281828459045, 1.0,  1.3498588075760032],
            dtype='float64')
factorize(na_sentinel=- 1)

Encode the input values as integer labels

See also

cudf.core.series.Series.factorize

Encode the input values of Series.

fillna(value, downcast=None)

Fill null values with the specified value.

Parameters
valuescalar

Scalar value to use to fill nulls. This value cannot be a list-likes.

downcastdict, default is None

This Parameter is currently NON-FUNCTIONAL.

Returns
filledIndex

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, None, 4])
>>> index
Int64Index([1, 2, null, 4], dtype='int64')
>>> index.fillna(3)
Int64Index([1, 2, 3, 4], dtype='int64')
find_label_range(first, last)

Find range that starts with first and ends with last, inclusively.

Returns
begin, end2-tuple of int

The starting index and the ending index. The last value occurs at end - 1 position.

classmethod from_arrow(array)

Convert PyArrow Array/ChunkedArray to Index

Parameters
arrayPyArrow Array/ChunkedArray

PyArrow Object which has to be converted to Index

Returns
cudf Index
Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pyarrow as pa
>>> cudf.Index.from_arrow(pa.array(["a", "b", None]))
StringIndex(['a' 'b' None], dtype='object')
classmethod from_pandas(index, nan_as_null=None)

Convert from a Pandas Index.

Parameters
indexPandas Index object

A Pandas Index object which has to be converted to cuDF Index.

nan_as_nullbool, Default None

If None/True, converts np.nan values to null values. If False, leaves np.nan values as is.

Raises
TypeError for invalid input type.

Examples

>>> import cudf
>>> import pandas as pd
>>> import numpy as np
>>> data = [10, 20, 30, np.nan]
>>> pdi = pd.Index(data)
>>> cudf.core.index.Index.from_pandas(pdi)
Float64Index([10.0, 20.0, 30.0, <NA>], dtype='float64')
>>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False)
Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
get_level_values(level)

Return an Index of values for requested level.

This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.

Parameters
levelint or str

It is either the integer position or the name of the level.

Returns
Index

Calling object, as there is only one level in the Index.

See also

cudf.core.multiindex.MultiIndex.get_level_values

Get values for a level of a MultiIndex.

Notes

For Index, level should be 0, since there are no multiple levels.

Examples

>>> import cudf
>>> idx = cudf.Index(["a", "b", "c"])
>>> idx.get_level_values(0)
StringIndex(['a' 'b' 'c'], dtype='object')
get_slice_bound(label, side, kind)

Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if side=='right') position of given label.

Parameters
labelobject
side{‘left’, ‘right’}
kind{‘ix’, ‘loc’, ‘getitem’}
Returns
int

Index of label.

property gpu_values

View the data as a numba device array object

interleave_columns()

Interleave Series columns of a table into a single column.

Converts the column major table cols into a row major column.

Parameters
colsinput Table containing columns to interleave.
Returns
The interleaved columns as a single column

Examples

>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']])
>>> df
0    [A1, A2, A3]
1    [B1, B2, B3]
>>> df.interleave_columns()
0    A1
1    B1
2    A2
3    B2
4    A3
5    B3
property is_monotonic

Alias for is_monotonic_increasing.

property is_monotonic_decreasing

Return if the index is monotonic decreasing (only equal or decreasing) values.

property is_monotonic_increasing

Return if the index is monotonic increasing (only equal or increasing) values.

property is_unique

Return if the index has unique values.

isin(values)

Return a boolean array where the index values are in values.

Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.

Parameters
valuesset, list-like, Index

Sought values.

Returns
is_containedcupy array

CuPy array of boolean values.

Examples

>>> idx = cudf.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')

Check whether each index value in a list of values.

>>> idx.isin([1, 4])
array([ True, False, False])
isna()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
isnull()

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isnull()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isnull()
GenericIndex([False, False, True, True, False, False], dtype='bool')
join(other, how='left', level=None, return_indexers=False, sort=False)

Compute join_index and indexers to conform data structures to the new index.

Parameters
otherIndex.
how{‘left’, ‘right’, ‘inner’, ‘outer’}
return_indexersbool, default False
sortbool, default False

Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).

Returns: index

Examples

>>> import cudf
>>> lhs = cudf.DataFrame(
...     {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b']
... ).index
>>> lhs
MultiIndex([(2, 3),
            (3, 4),
            (1, 2)],
           names=['a', 'b'])
>>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index
>>> rhs
Int64Index([1, 4, 3], dtype='int64', name='a')
>>> lhs.join(rhs, how='inner')
MultiIndex([(3, 4),
            (1, 2)],
           names=['a', 'b'])
log()

Get the natural logarithm of all elements, element-wise.

Natural logarithm is the inverse of the exp function, so that x.log().exp() = x

Returns
DataFrame/Series/Index

Result of the element-wise natural logarithm.

Examples

>>> import cudf
>>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100])
>>> ser
0     -1.00000
1      0.00000
2      1.00000
3      0.32434
4      0.50000
5    -10.00000
6    100.00000
dtype: float64
>>> ser.log()
0         NaN
1        -inf
2    0.000000
3   -1.125963
4   -0.693147
5         NaN
6    4.605170
dtype: float64

log operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-1, -10, 0.5],
...                      'second': [0.234, 0.3, 10]})
>>> df
   first  second
0   -1.0   0.234
1  -10.0   0.300
2    0.5  10.000
>>> df.log()
      first    second
0       NaN -1.452434
1       NaN -1.203973
2 -0.693147  2.302585

log operation on Index:

>>> index = cudf.Index([10, 11, 500.0])
>>> index
Float64Index([10.0, 11.0, 500.0], dtype='float64')
>>> index.log()
Float64Index([2.302585092994046, 2.3978952727983707,
            6.214608098422191], dtype='float64')
mask(cond, other=None, inplace=False)

Replace values where the condition is True.

Parameters
condbool Series/DataFrame, array-like

Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.

other: scalar, list of scalars, Series/DataFrame

Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.

DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.

Series expects only scalar or series like with same length

inplacebool, default False

Whether to perform the operation in place on the data.

Returns
Same type as caller

Examples

>>> import cudf
>>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]})
>>> df.mask(df % 2 == 0, [-1, -1])
   A  B
0  1  3
1 -1  5
2  5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.mask(ser > 2, 10)
0    10
1    10
2     2
3     1
4     0
dtype: int64
>>> ser.mask(ser > 2)
0    <NA>
1    <NA>
2       2
3       1
4       0
dtype: int64
max()

Return the maximum value of the Index.

Returns
scalar

Maximum value.

See also

cudf.core.index.Index.min

Return the minimum value in an Index.

cudf.core.series.Series.max

Return the maximum value in a Series.

cudf.core.dataframe.DataFrame.max

Return the maximum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.max()
3
memory_usage(deep=False)

Memory usage of the values.

Parameters
deepbool

Introspect the data deeply, interrogate object dtypes for system-level memory consumption.

Returns
bytes used
property microseconds

Number of microseconds (>= 0 and less than 1 second) for each element.

min()

Return the minimum value of the Index.

Returns
scalar

Minimum value.

See also

cudf.core.index.Index.max

Return the maximum value in an Index.

cudf.core.series.Series.min

Return the minimum value in a Series.

cudf.core.dataframe.DataFrame.min

Return the minimum values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.min()
1
property name

Returns the name of the Index.

property names

Returns a tuple containing the name of the Index.

property nanoseconds

Number of nanoseconds (>= 0 and less than 1 microsecond) for each element.

property ndim

Dimension of the data. Apart from MultiIndex ndim is always 1.

property nlevels

Number of levels.

notna()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
notnull()

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf incase of float are not considered <NA> values.

Returns
DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notnull()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notnull()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notnull()
GenericIndex([True, True, False, False, True, True], dtype='bool')
pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

Parameters
funcfunction

Function to apply to the Series/DataFrame/Index. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame/Index.

argsiterable, optional

Positional arguments passed into func.

kwargsmapping, optional

A dictionary of keyword arguments passed into func.

Returns
objectthe return type of func.

Examples

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. Instead of writing

>>> func(g(h(df), arg1=a), arg2=b, arg3=c)

You can write

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe(func, arg2=b, arg3=c)
... )

If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose f takes its data as arg2:

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe((func, 'arg2'), arg1=a, arg3=c)
...  )
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)

Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.

Parameters
axis{0 or ‘index’, 1 or ‘columns’}, default 0

Index to direct ranking.

method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’

How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.

numeric_onlybool, optional

For DataFrame objects, rank only numeric columns if set to True.

na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’

How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.

ascendingbool, default True

Whether or not the elements should be ranked in ascending order.

pctbool, default False

Whether or not to display the returned rankings in percentile form.

Returns
same type as caller

Return a Series or DataFrame with data ranks as values.

rename(name, inplace=False)

Alter Index name.

Defaults to returning new index.

Parameters
namelabel

Name(s) to set.

Returns
Index

Examples

>>> import cudf
>>> index = cudf.Index([1, 2, 3], name='one')
>>> index
Int64Index([1, 2, 3], dtype='int64', name='one')
>>> index.name
'one'
>>> renamed_index = index.rename('two')
>>> renamed_index
Int64Index([1, 2, 3], dtype='int64', name='two')
>>> renamed_index.name
'two'
repeat(repeats, axis=None)

Repeats elements consecutively.

Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.

Parameters
repeatsint, or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.

Returns
Series/DataFrame/Index

A newly created object of same type as caller with repeated elements.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]})
>>> df
   a   b
0  1  10
1  2  20
2  3  30
>>> df.repeat(3)
   a   b
0  1  10
0  1  10
0  1  10
1  2  20
1  2  20
1  2  20
2  3  30
2  3  30
2  3  30

Repeat on Series

>>> s = cudf.Series([0, 2])
>>> s
0    0
1    2
dtype: int64
>>> s.repeat([3, 4])
0    0
0    0
0    0
1    2
1    2
1    2
1    2
dtype: int64
>>> s.repeat(2)
0    0
0    0
1    2
1    2
dtype: int64

Repeat on Index

>>> index = cudf.Index([10, 22, 33, 55])
>>> index
Int64Index([10, 22, 33, 55], dtype='int64')
>>> index.repeat(5)
Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33,
            33, 33, 33, 33, 55, 55, 55, 55, 55],
        dtype='int64')
round(decimals=0)

Round a DataFrame to a variable number of decimal places.

Parameters
decimalsint, dict, Series

Number of decimal places to round each column to. If an int is given, round each column to the same number of places. Otherwise dict and Series round to variable numbers of places. Column names should be in the keys if decimals is a dict-like, or in the index if decimals is a Series. Any columns not included in decimals will be left as is. Elements of decimals which are not columns of the input will be ignored.

Returns
DataFrame

A DataFrame with the affected columns rounded to the specified number of decimal places.

Examples

>>> df = cudf.DataFrame(
        [(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
...     columns=['dogs', 'cats']
... )
>>> df
    dogs  cats
0  0.21  0.32
1  0.01  0.67
2  0.66  0.03
3  0.21  0.18

By providing an integer each column is rounded to the same number of decimal places

>>> df.round(1)
    dogs  cats
0   0.2   0.3
1   0.0   0.7
2   0.7   0.0
3   0.2   0.2

With a dict, the number of places for specific columns can be specified with the column names as key and the number of decimal places as value

>>> df.round({'dogs': 1, 'cats': 0})
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

Using a Series, the number of places for specific columns can be specified with the column names as index and the number of decimal places as value

>>> decimals = cudf.Series([0, 1], index=['cats', 'dogs'])
>>> df.round(decimals)
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters
nint, optional

Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

fracfloat, optional

Fraction of axis items to return. Cannot be used with n.

replacebool, default False

Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”

weightsstr or ndarray-like, optional

Only supported for axis=1/”columns”

random_stateint, numpy RandomState or None, default None

Seed for the random number generator (if int), or None. If None, a random seed will be chosen. if RandomState, seed will be extracted from current state.

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.

Returns
Series or DataFrame or Index

A new object of same type as caller containing n items randomly sampled from the caller object.

Examples

>>> import cudf as cudf
>>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}})
>>> df.sample(3)
   a
1  2
3  4
0  1
>>> sr = cudf.Series([1, 2, 3, 4, 5])
>>> sr.sample(10, replace=True)
1    4
3    1
2    4
0    5
0    1
4    5
4    1
0    2
0    3
3    2
dtype: int64
>>> df = cudf.DataFrame(
... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]})
>>> df.sample(2, axis=1)
   a  c
0  1  3
1  2  4
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)

Scatter to a list of dataframes.

Uses map_index to determine the destination of each row of the original DataFrame.

Parameters
map_indexSeries, str or list-like

Scatter assignment for each row

map_sizeint

Length of output list. Must be >= uniques in map_index

keep_indexbool

Conserve original index values for each row

Returns
A list of cudf.DataFrame objects.
searchsorted(values, side='left', ascending=True, na_position='last')

Find indices where elements should be inserted to maintain order

Parameters
valueFrame (Shape must be consistent with self)

Values to be hypothetically inserted into Self

sidestr {‘left’, ‘right’} optional, default ‘left‘

If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index

ascendingbool optional, default True

Sorted Frame is in ascending order (otherwise descending)

na_positionstr {‘last’, ‘first’} optional, default ‘last‘

Position of null values in sorted order

Returns
1-D cupy array of insertion points

Examples

>>> s = cudf.Series([1, 2, 3])
>>> s.searchsorted(4)
3
>>> s.searchsorted([0, 4])
array([0, 3], dtype=int32)
>>> s.searchsorted([1, 3], side='left')
array([0, 2], dtype=int32)
>>> s.searchsorted([1, 3], side='right')
array([1, 3], dtype=int32)

If the values are not monotonically sorted, wrong locations may be returned:

>>> s = cudf.Series([2, 1, 3])
>>> s.searchsorted(1)
0   # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]})
>>> df
   a   b
0  1  10
1  3  12
2  5  14
3  7  16
>>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6],
... 'b': [10, 11, 13, 15]})
>>> values_df
   a   b
0  0  10
1  2  17
2  5  13
3  6  15
>>> df.searchsorted(values_df, ascending=False)
array([4, 4, 4, 0], dtype=int32)
property seconds

Number of seconds (>= 0 and less than 1 day) for each element.

set_names(names, level=None, inplace=False)

Set Index or MultiIndex name. Able to set new names partially and by level.

Parameters
nameslabel or list of label

Name(s) to set.

levelint, label or list of int or label, optional

If the index is a MultiIndex, level(s) to set (None for all levels). Otherwise level must be None.

inplacebool, default False

Modifies the object directly, instead of creating a new Index or MultiIndex.

Returns
Index

The same type as the caller or None if inplace is True.

See also

cudf.core.index.Index.rename

Able to set new names without level.

Examples

>>> import cudf
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = cudf.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
            ('python', 2019),
            ( 'cobra', 2018),
            ( 'cobra', 2019)],
           )
>>> idx.names
FrozenList([None, None])
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx.names
FrozenList(['kind', 'year'])
>>> idx.set_names('species', level=0, inplace=True)
>>> idx.names
FrozenList(['species', 'year'])
property shape

Returns a tuple representing the dimensionality of the Index.

shift(periods=1, freq=None, axis=0, fill_value=None)

Shift values by periods positions.

sin()

Get Trigonometric sine, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.sin()
0    0.000000
1    0.318683
2    0.479426
3    0.850904
4    0.893997
5   -0.801153
6    0.958916
dtype: float64

sin operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.sin()
      first    second
0  0.000000 -0.506366
1 -0.958924  0.958916
2 -0.544021 -0.544072
3  0.650288 -0.999756

sin operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.sin()
Float64Index([-0.3894183423086505, -0.5063656411097588,
            0.8011526357338306, 0.8939966636005579],
            dtype='float64')
property size

Return the number of elements in the underlying data.

Returns
sizeSize of the DataFrame / Index / Series / MultiIndex

Examples

Size of an empty dataframe is 0.

>>> import cudf
>>> df = cudf.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>> df.size
0
>>> df = cudf.DataFrame(index=[1, 2, 3])
>>> df
Empty DataFrame
Columns: []
Index: [1, 2, 3]
>>> df.size
0

DataFrame with values

>>> df = cudf.DataFrame({'a': [10, 11, 12],
...         'b': ['hello', 'rapids', 'ai']})
>>> df
    a       b
0  10   hello
1  11  rapids
2  12      ai
>>> df.size
6
>>> df.index
RangeIndex(start=0, stop=3)
>>> df.index.size
3

Size of an Index

>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.size
0
>>> index = cudf.Index([1, 2, 3, 10])
>>> index
Int64Index([1, 2, 3, 10], dtype='int64')
>>> index.size
4

Size of a MultiIndex

>>> midx = cudf.MultiIndex(
...                 levels=[["a", "b", "c", None], ["1", None, "5"]],
...                 codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...                 names=["x", "y"],
...             )
>>> midx
MultiIndex([( 'a',  '1'),
            ( 'a',  '5'),
            ( 'b', <NA>),
            ( 'c', <NA>),
            (<NA>,  '1')],
           names=['x', 'y'])
>>> midx.size
5
sort_values(return_indexer=False, ascending=True, key=None)

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

Parameters
return_indexerbool, default False

Should the indices that would sort the index be returned.

ascendingbool, default True

Should the index values be sorted in an ascending order.

keyNone, optional

This parameter is NON-FUNCTIONAL.

Returns
sorted_indexIndex

Sorted copy of the index.

indexercupy.ndarray, optional

The indices that the index itself was sorted by.

See also

cudf.core.series.Series.min

Sort values of a Series.

cudf.core.dataframe.DataFrame.sort_values

Sort values in a DataFrame.

Examples

>>> import cudf
>>> idx = cudf.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')

Sort values in ascending order (default behavior).

>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')

Sort values in descending order, and also get the indices idx was sorted by.

>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2],
                                                    dtype=int32))

Sorting values in a MultiIndex:

>>> midx = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> midx
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> midx.sort_values()
MultiIndex([(-10,  1),
            (  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11)],
           names=['x', 'y'])
>>> midx.sort_values(ascending=False)
MultiIndex([(  4, 11),
            (  3, 11),
            (  1,  5),
            (  1,  1),
            (-10,  1)],
           names=['x', 'y'])
sqrt()

Get the non-negative square-root of all elements, element-wise.

Returns
DataFrame/Series/Index

Result of the non-negative square-root of each element.

Examples

>>> import cudf
>>> import cudf
>>> ser = cudf.Series([10, 25, 81, 1.0, 100])
>>> ser
0     10.0
1     25.0
2     81.0
3      1.0
4    100.0
dtype: float64
>>> ser.sqrt()
0     3.162278
1     5.000000
2     9.000000
3     1.000000
4    10.000000
dtype: float64

sqrt operation on DataFrame:

>>> df = cudf.DataFrame({'first': [-10.0, 100, 625],
...                      'second': [1, 2, 0.4]})
>>> df
   first  second
0  -10.0     1.0
1  100.0     2.0
2  625.0     0.4
>>> df.sqrt()
   first    second
0    NaN  1.000000
1   10.0  1.414214
2   25.0  0.632456

sqrt operation on Index:

>>> index = cudf.Index([-10.0, 100, 625])
>>> index
Float64Index([-10.0, 100.0, 625.0], dtype='float64')
>>> index.sqrt()
Float64Index([nan, 10.0, 25.0], dtype='float64')
sum()

Return the sum of all values of the Index.

Returns
scalar

Sum of all values.

Examples

>>> import cudf
>>> idx = cudf.Index([3, 2, 1])
>>> idx.sum()
6
take(indices)

Gather only the specific subset of indices

Parameters
indices: An array-like that maps to values contained in this Index.
tan()

Get Trigonometric tangent, element-wise.

Returns
DataFrame/Series/Index

Result of the trigonometric operation.

Examples

>>> import cudf
>>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360])
>>> ser
0      0.00000
1      0.32434
2      0.50000
3     45.00000
4     90.00000
5    180.00000
6    360.00000
dtype: float64
>>> ser.tan()
0    0.000000
1    0.336213
2    0.546302
3    1.619775
4   -1.995200
5    1.338690
6   -3.380140
dtype: float64

tan operation on DataFrame:

>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15],
...                      'second': [100.0, 360, 720, 300]})
>>> df
   first  second
0    0.0   100.0
1    5.0   360.0
2   10.0   720.0
3   15.0   300.0
>>> df.tan()
      first     second
0  0.000000  -0.587214
1 -3.380515  -3.380140
2  0.648361   0.648446
3 -0.855993  45.244742

tan operation on Index:

>>> index = cudf.Index([-0.4, 100, -180, 90])
>>> index
Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64')
>>> index.tan()
Float64Index([-0.4227932187381618,  -0.587213915156929,
            -1.3386902103511544, -1.995200412208242],
            dtype='float64')
tile(count)

Repeats the rows from self DataFrame count times to form a new DataFrame.

Parameters
selfinput Table containing columns to interleave.
countNumber of times to tile “rows”. Must be non-negative.
Returns
The table containing the tiled “rows”.

Examples

>>> df  = Dataframe([[8, 4, 7], [5, 2, 3]])
>>> count = 2
>>> df.tile(df, count)
   0  1  2
0  8  4  7
1  5  2  3
0  8  4  7
1  5  2  3
to_array(fillna=None)

Get a dense numpy array for the data.

Parameters
fillnastr or None

Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.

Notes

if fillna is None, null values are skipped. Therefore, the output size could be smaller.

to_arrow()

Convert Index to PyArrow Array

Returns
PyArrow Array

Examples

>>> import cudf
>>> ind = cudf.Index(["a", "b", None])
>>> ind.to_arrow()
<pyarrow.lib.StringArray object at 0x7f796b0e7750>
[
  "a",
  "b",
  null
]
to_dlpack()

Converts a cuDF object into a DLPack tensor.

DLPack is an open-source memory tensor structure: dmlc/dlpack.

This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.

Parameters
cudf_objDataFrame, Series, Index, or Column
Returns
pycapsule_objPyCapsule

Output DLPack tensor pointer which is encapsulated in a PyCapsule object.

to_frame(index=True, name=None)

Create a DataFrame with a column containing this Index

Parameters
indexboolean, default True

Set the index of the returned DataFrame as the original Index

namestr, default None

Name to be used for the column

Returns
DataFrame

cudf DataFrame

to_pandas()

Convert to a Pandas Index.

Examples

>>> import cudf
>>> idx = cudf.Index([-3, 10, 15, 20])
>>> idx
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> idx.to_pandas()
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> type(idx.to_pandas())
<class 'pandas.core.indexes.numeric.Int64Index'>
>>> type(idx)
<class 'cudf.core.index.GenericIndex'>
to_series(index=None, name=None)

Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.

Parameters
indexIndex, optional

Index of resulting Series. If None, defaults to original index.

namestr, optional

Dame of resulting Series. If None, defaults to name of original index.

Returns
Series

The dtype will be based on the type of the Index values.

unique()

Return unique values in the index.

Returns
Index without duplicates
property values

Return an array representing the data in the Index.

Returns
arrayA cupy array of data in the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values
array([  1, -10, 100,  20])
>>> type(index.values)
<class 'cupy.core.core.ndarray'>
property values_host

Return a numpy representation of the Index.

Only the values in the Index will be returned.

Returns
outnumpy.ndarray

The values of the Index.

Examples

>>> import cudf
>>> index = cudf.Index([1, -10, 100, 20])
>>> index.values_host
array([  1, -10, 100,  20])
>>> type(index.values_host)
<class 'numpy.ndarray'>
where(cond, other=None)

Replace values where the condition is False.

Parameters
condbool array-like with the same length as self

Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.

other: scalar, or array-like

Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.

Returns
Same type as caller

Examples

>>> import cudf
>>> index = cudf.Index([4, 3, 2, 1, 0])
>>> index
Int64Index([4, 3, 2, 1, 0], dtype='int64')
>>> index.where(index > 2, 15)
Int64Index([4, 3, 15, 15, 15], dtype='int64')

Categories

class cudf.core.column.categorical.CategoricalAccessor(column: Any, parent: Optional[Union[cudf.Series, cudf.Index]] = None)

Accessor object for categorical properties of the Series values. Be aware that assigning to categories is a inplace operation, while all methods return new categorical data per default.

Parameters
columnColumn
parentSeries or CategoricalIndex

Examples

>>> s = cudf.Series([1,2,3], dtype='category')
>>> s
>>> s
0    1
1    2
2    3
dtype: category
Categories (3, int64): [1, 2, 3]
>>> s.cat.categories
Int64Index([1, 2, 3], dtype='int64')
>>> s.cat.reorder_categories([3,2,1])
0    1
1    2
2    3
dtype: category
Categories (3, int64): [3, 2, 1]
>>> s.cat.remove_categories([1])
0    <NA>
1       2
2       3
dtype: category
Categories (2, int64): [2, 3]
>>> s.cat.set_categories(list('abcde'))
0    <NA>
1    <NA>
2    <NA>
dtype: category
Categories (5, object): ['a', 'b', 'c', 'd', 'e']
>>> s.cat.as_ordered()
0    1
1    2
2    3
dtype: category
Categories (3, int64): [1 < 2 < 3]
>>> s.cat.as_unordered()
0    1
1    2
2    3
dtype: category
Categories (3, int64): [1, 2, 3]
Attributes
categories

The categories of this categorical.

codes

Return Series of codes as well as the index.

ordered

Whether the categories have an ordered relationship.

Methods

add_categories(new_categories[, inplace])

Add new categories.

as_ordered([inplace])

Set the Categorical to be ordered.

as_unordered([inplace])

Set the Categorical to be unordered.

remove_categories(removals[, inplace])

Remove the specified categories.

reorder_categories(new_categories[, …])

Reorder categories as specified in new_categories.

set_categories(new_categories[, ordered, …])

Set the categories to the specified new_categories.

add_categories(new_categories: Any, inplace: bool = False)Optional[Union[cudf.Series, cudf.Index]]

Add new categories.

new_categories will be included at the last/highest place in the categories and will be unused directly after this call.

Parameters
new_categoriescategory or list-like of category

The new categories to be included.

inplacebool, default False

Whether or not to add the categories inplace or return a copy of this categorical with added categories.

Returns
cat

Categorical with new categories added or None if inplace.

Examples

>>> import cudf
>>> s = cudf.Series([1, 2], dtype="category")
>>> s
0    1
1    2
dtype: category
Categories (2, int64): [1, 2]
>>> s.cat.add_categories([0, 3, 4])
0    1
1    2
dtype: category
Categories (5, int64): [1, 2, 0, 3, 4]
>>> s
0    1
1    2
dtype: category
Categories (2, int64): [1, 2]
>>> s.cat.add_categories([0, 3, 4], inplace=True)
>>> s
0    1
1    2
dtype: category
Categories (5, int64): [1, 2, 0, 3, 4]
as_ordered(inplace: bool = False)Optional[Union[cudf.Series, cudf.Index]]

Set the Categorical to be ordered.

Parameters
inplacebool, default False

Whether or not to add the categories inplace or return a copy of this categorical with added categories.

Returns
Categorical

Ordered Categorical or None if inplace.

Examples

>>> import cudf
>>> s = cudf.Series([10, 1, 1, 2, 10, 2, 10], dtype="category")
>>> s
0    10
1     1
2     1
3     2
4    10
5     2
6    10
dtype: category
Categories (3, int64): [1, 2, 10]
>>> s.cat.as_ordered()
0    10
1     1
2     1
3     2
4    10
5     2
6    10
dtype: category
Categories (3, int64): [1 < 2 < 10]
>>> s.cat.as_ordered(inplace=True)
>>> s
0    10
1     1
2     1
3     2
4    10
5     2
6    10
dtype: category
Categories (3, int64): [1 < 2 < 10]
as_unordered(inplace: bool = False)Optional[Union[cudf.Series, cudf.Index]]

Set the Categorical to be unordered.

Parameters
inplacebool, default False

Whether or not to set the ordered attribute in-place or return a copy of this categorical with ordered set to False.

Returns
Categorical

Unordered Categorical or None if inplace.

Examples

>>> import cudf
>>> s = cudf.Series([10, 1, 1, 2, 10, 2, 10], dtype="category")
>>> s
0    10
1     1
2     1
3     2
4    10
5     2
6    10
dtype: category
Categories (3, int64): [1, 2, 10]
>>> s = s.cat.as_ordered()
>>> s
0    10
1     1
2     1
3     2
4    10
5     2
6    10
dtype: category
Categories (3, int64): [1 < 2 < 10]
>>> s.cat.as_unordered()
0    10
1     1
2     1
3     2
4    10
5     2
6    10
dtype: category
Categories (3, int64): [1, 2, 10]
>>> s.cat.as_unordered(inplace=True)
>>> s
0    10
1     1
2     1
3     2
4    10
5     2
6    10
dtype: category
Categories (3, int64): [1, 2, 10]
property categories

The categories of this categorical.

property codes

Return Series of codes as well as the index.

property ordered

Whether the categories have an ordered relationship.

remove_categories(removals: Any, inplace: bool = False)Optional[Union[cudf.Series, cudf.Index]]

Remove the specified categories.

removals must be included in the old categories. Values which were in the removed categories will be set to null.

Parameters
removalscategory or list-like of category

The categories which should be removed.

inplacebool, default False

Whether or not to remove the categories inplace or return a copy of this categorical with removed categories.

Returns
cat

Categorical with removed categories or None if inplace.

Examples

>>> import cudf
>>> s = cudf.Series([10, 1, 1, 2, 10, 2, 10], dtype="category")
>>> s
0    10
1     1
2     1
3     2
4    10
5     2
6    10
dtype: category
Categories (3, int64): [1, 2, 10]
>>> s.cat.remove_categories([1])
0      10
1    <NA>
2    <NA>
3       2
4      10
5       2
6      10
dtype: category
Categories (2, int64): [2, 10]
>>> s
0    10
1     1
2     1
3     2
4    10
5     2
6    10
dtype: category
Categories (3, int64): [1, 2, 10]
>>> s.cat.remove_categories([10], inplace=True)
>>> s
0    <NA>
1       1
2       1
3       2
4    <NA>
5       2
6    <NA>
dtype: category
Categories (2, int64): [1, 2]
reorder_categories(new_categories: Any, ordered: bool = False, inplace: bool = False)Optional[Union[cudf.Series, cudf.Index]]

Reorder categories as specified in new_categories.

new_categories need to include all old categories and no new category items.

Parameters
new_categoriesIndex-like

The categories in new order.

orderedbool, optional

Whether or not the categorical is treated as a ordered categorical. If not given, do not change the ordered information.

inplacebool, default False

Whether or not to reorder the categories inplace or return a copy of this categorical with reordered categories.

Returns
cat

Categorical with reordered categories or None if inplace.

Raises
ValueError

If the new categories do not contain all old category items or any new ones.

Examples

>>> import cudf
>>> s = cudf.Series([10, 1, 1, 2, 10, 2, 10], dtype="category")
>>> s
0    10
1     1
2     1
3     2
4    10
5     2
6    10
dtype: category
Categories (3, int64): [1, 2, 10]
>>> s.cat.reorder_categories([10, 1, 2])
0    10
1     1
2     1
3     2
4    10
5     2
6    10
dtype: category
Categories (3, int64): [10, 1, 2]
>>> s.cat.reorder_categories([10, 1])
ValueError: items in new_categories are not the same as in
old categories
set_categories(new_categories: Any, ordered: bool = False, rename: bool = False, inplace: bool = False)Optional[Union[cudf.Series, cudf.Index]]

Set the categories to the specified new_categories.

new_categories can include new categories (which will result in unused categories) or remove old categories (which results in values set to null). If rename==True, the categories will simple be renamed (less or more items than in old categories will result in values set to null or in unused categories respectively).

This method can be used to perform more than one action of adding, removing, and reordering simultaneously and is therefore faster than performing the individual steps via the more specialised methods.

On the other hand this methods does not do checks (e.g., whether the old categories are included in the new categories on a reorder), which can result in surprising changes.

Parameters
new_categorieslist-like

The categories in new order.

orderedbool, default None

Whether or not the categorical is treated as a ordered categorical. If not given, do not change the ordered information.

renamebool, default False

Whether or not the new_categories should be considered as a rename of the old categories or as reordered categories.

inplacebool, default False

Whether or not to reorder the categories in-place or return a copy of this categorical with reordered categories.

Returns
cat

Categorical with reordered categories or None if inplace.

Examples

>>> import cudf
>>> s = cudf.Series([1, 1, 2, 10, 2, 10], dtype='category')
>>> s
0     1
1     1
2     2
3    10
4     2
5    10
dtype: category
Categories (3, int64): [1, 2, 10]
>>> s.cat.set_categories([1, 10])
0       1
1       1
2    <NA>
3      10
4    <NA>
5      10
dtype: category
Categories (2, int64): [1, 10]
>>> s.cat.set_categories([1, 10], inplace=True)
>>> s
0       1
1       1
2    <NA>
3      10
4    <NA>
5      10
dtype: category
Categories (2, int64): [1, 10]

GroupBy

class cudf.core.groupby.groupby.GroupBy(obj, by=None, level=None, sort=False, as_index=True, dropna=True)

Group a DataFrame or Series by a set of columns.

Parameters
byoptional

Specifies the grouping columns. Can be any of the following: - A Python function called on each value of the object’s index - A dict or Series that maps index labels to group names - A cudf.Index object - A str indicating a column name - An array of the same length as the object - A Grouper object - A list of the above

levelint, level_name or list, optional

For objects with a MultiIndex, level can be used to specify grouping by one or more levels of the MultiIndex.

sortbool, default False

Sort the result by group keys. Differ from Pandas, cudf defaults to False for better performance.

as_indexbool, optional

If as_index=True (default), the group names appear as the keys of the resulting DataFrame. If as_index=False, the groups are returned as ordinary columns of the resulting DataFrame, if they are named columns.

dropnabool, optional

If True (default), do not include the “null” group.

Methods

agg(func)

Apply aggregation(s) to the groups.

aggregate(func)

Apply aggregation(s) to the groups.

apply(function)

Apply a python transformation function over the grouped chunk.

apply_grouped(function, **kwargs)

Apply a transformation function over the grouped chunk.

nth(n)

Return the nth row from each group.

pipe(func, *args, **kwargs)

Apply a function func with arguments to this GroupBy object and return the function’s result.

rolling(*args, **kwargs)

Returns a RollingGroupby object that enables rolling window calculations on the groups.

size()

Return the size of each group.

collect

count

idxmax

idxmin

max

mean

median

min

nunique

quantile

std

sum

unique

var

agg(func)

Apply aggregation(s) to the groups.

Parameters
funcstr, callable, list or dict
Returns
A Series or DataFrame containing the combined results of the
aggregation.

Examples

>>> import cudf
>>> a = cudf.DataFrame(
    {'a': [1, 1, 2], 'b': [1, 2, 3], 'c': [2, 2, 1]})
>>> a.groupby('a').agg('sum')
   b
a
2  3
1  3

Specifying a list of aggregations to perform on each column.

>>> a.groupby('a').agg(['sum', 'min'])
    b       c
  sum min sum min
a
2   3   3   1   1
1   3   1   4   2

Using a dict to specify aggregations to perform per column.

>>> a.groupby('a').agg({'a': 'max', 'b': ['min', 'mean']})
    a   b
  max min mean
a
2   2   3  3.0
1   1   1  1.5

Using lambdas/callables to specify aggregations taking parameters.

>>> f1 = lambda x: x.quantile(0.5); f1.__name__ = "q0.5"
>>> f2 = lambda x: x.quantile(0.75); f2.__name__ = "q0.75"
>>> a.groupby('a').agg([f1, f2])
     b          c
  q0.5 q0.75 q0.5 q0.75
a
1  1.5  1.75  2.0   2.0
2  3.0  3.00  1.0   1.0
aggregate(func)

Apply aggregation(s) to the groups.

Parameters
funcstr, callable, list or dict
Returns
A Series or DataFrame containing the combined results of the
aggregation.

Examples

>>> import cudf
>>> a = cudf.DataFrame(
    {'a': [1, 1, 2], 'b': [1, 2, 3], 'c': [2, 2, 1]})
>>> a.groupby('a').agg('sum')
   b
a
2  3
1  3

Specifying a list of aggregations to perform on each column.

>>> a.groupby('a').agg(['sum', 'min'])
    b       c
  sum min sum min
a
2   3   3   1   1
1   3   1   4   2

Using a dict to specify aggregations to perform per column.

>>> a.groupby('a').agg({'a': 'max', 'b': ['min', 'mean']})
    a   b
  max min mean
a
2   2   3  3.0
1   1   1  1.5

Using lambdas/callables to specify aggregations taking parameters.

>>> f1 = lambda x: x.quantile(0.5); f1.__name__ = "q0.5"
>>> f2 = lambda x: x.quantile(0.75); f2.__name__ = "q0.75"
>>> a.groupby('a').agg([f1, f2])
     b          c
  q0.5 q0.75 q0.5 q0.75
a
1  1.5  1.75  2.0   2.0
2  3.0  3.00  1.0   1.0
apply(function)

Apply a python transformation function over the grouped chunk.

Parameters
funcfunction

The python transformation function that will be applied on the grouped chunk.

Examples

from cudf import DataFrame
df = DataFrame()
df['key'] = [0, 0, 1, 1, 2, 2, 2]
df['val'] = [0, 1, 2, 3, 4, 5, 6]
groups = df.groupby(['key'])

# Define a function to apply to each row in a group
def mult(df):
  df['out'] = df['key'] * df['val']
  return df

result = groups.apply(mult)
print(result)

Output:

   key  val  out
0    0    0    0
1    0    1    0
2    1    2    2
3    1    3    3
4    2    4    8
5    2    5   10
6    2    6   12
apply_grouped(function, **kwargs)

Apply a transformation function over the grouped chunk.

This uses numba’s CUDA JIT compiler to convert the Python transformation function into a CUDA kernel, thus will have a compilation overhead during the first run.

Parameters
funcfunction

The transformation function that will be executed on the CUDA GPU.

incols: list

A list of names of input columns.

outcols: list

A dictionary of output column names and their dtype.

kwargsdict

name-value of extra arguments. These values are passed directly into the function.

Examples

from cudf import DataFrame
from numba import cuda
import numpy as np

df = DataFrame()
df['key'] = [0, 0, 1, 1, 2, 2, 2]
df['val'] = [0, 1, 2, 3, 4, 5, 6]
groups = df.groupby(['key'])

# Define a function to apply to each group
def mult_add(key, val, out1, out2):
    for i in range(cuda.threadIdx.x, len(key), cuda.blockDim.x):
        out1[i] = key[i] * val[i]
        out2[i] = key[i] + val[i]

result = groups.apply_grouped(mult_add,
                              incols=['key', 'val'],
                              outcols={'out1': np.int32,
                                       'out2': np.int32},
                              # threads per block
                              tpb=8)

print(result)

Output:

   key  val out1 out2
0    0    0    0    0
1    0    1    0    1
2    1    2    2    3
3    1    3    3    4
4    2    4    8    6
5    2    5   10    7
6    2    6   12    8
import cudf
import numpy as np
from numba import cuda
import pandas as pd
from random import randint

# Create a random 15 row dataframe with one categorical
# feature and one random integer valued feature
df = cudf.DataFrame(
        {
            "cat": [1] * 5 + [2] * 5 + [3] * 5,
            "val": [randint(0, 100) for _ in range(15)],
        }
     )

# Group the dataframe by its categorical feature
groups = df.groupby("cat")

# Define a kernel which takes the moving average of a
# sliding window
def rolling_avg(val, avg):
    win_size = 3
    for i in range(cuda.threadIdx.x, len(val), cuda.blockDim.x):
        if i < win_size - 1:
            # If there is not enough data to fill the window,
            # take the average to be NaN
            avg[i] = np.nan
        else:
            total = 0
            for j in range(i - win_size + 1, i + 1):
                total += val[j]
            avg[i] = total / win_size

# Compute moving averages on all groups
results = groups.apply_grouped(rolling_avg,
                               incols=['val'],
                               outcols=dict(avg=np.float64))
print("Results:", results)

# Note this gives the same result as its pandas equivalent
pdf = df.to_pandas()
pd_results = pdf.groupby('cat')['val'].rolling(3).mean()

Output:

Results:
     cat  val                 avg
0    1   16
1    1   45
2    1   62                41.0
3    1   45  50.666666666666664
4    1   26  44.333333333333336
5    2    5
6    2   51
7    2   77  44.333333333333336
8    2    1                43.0
9    2   46  41.333333333333336
[5 more rows]

This is functionally equivalent to pandas.DataFrame.Rolling

nth(n)

Return the nth row from each group.

pipe(func, *args, **kwargs)

Apply a function func with arguments to this GroupBy object and return the function’s result.

Parameters
funcfunction

Function to apply to this GroupBy object or, alternatively, a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the GroupBy object.

argsiterable, optional

Positional arguments passed into func.

kwargsmapping, optional

A dictionary of keyword arguments passed into func.

Returns
objectthe return type of func.

See also

cudf.core.series.Series.pipe

Apply a function with arguments to a series.

cudf.core.dataframe.DataFrame.pipe

Apply a function with arguments to a dataframe.

apply

Apply function to each group instead of to the full GroupBy object.

Examples

>>> import cudf
>>> df = cudf.DataFrame({'A': ['a', 'b', 'a', 'b'], 'B': [1, 2, 3, 4]})
>>> df
A  B
0  a  1
1  b  2
2  a  3
3  b  4

To get the difference between each groups maximum and minimum value in one pass, you can do

>>> df.groupby('A').pipe(lambda x: x.max() - x.min())
B
A
a  2
b  2
rolling(*args, **kwargs)

Returns a RollingGroupby object that enables rolling window calculations on the groups.

size()

Return the size of each group.

Window

class cudf.core.window.Rolling(obj, window, min_periods=None, center=False, axis=0, win_type=None)

Rolling window calculations.

Parameters
windowint or offset

Size of the window, i.e., the number of observations used to calculate the statistic. For datetime indexes, an offset can be provided instead of an int. The offset must be convertible to a timedelta. As opposed to a fixed window size, each window will be sized to accommodate observations within the time period specified by the offset.

min_periodsint, optional

The minimum number of observations in the window that are required to be non-null, so that the result is non-null. If not provided or None, min_periods is equal to the window size.

centerbool, optional

If True, the result is set at the center of the window. If False (default), the result is set at the right edge of the window.

Returns
Rolling object.

Examples

>>> import cudf
>>> a = cudf.Series([1, 2, 3, None, 4])

Rolling sum with window size 2.

>>> print(a.rolling(2).sum())
0
1    3
2    5
3
4
dtype: int64

Rolling sum with window size 2 and min_periods 1.

>>> print(a.rolling(2, min_periods=1).sum())
0    1
1    3
2    5
3    3
4    4
dtype: int64

Rolling count with window size 3.

>>> print(a.rolling(3).count())
0    1
1    2
2    3
3    2
4    2
dtype: int64

Rolling count with window size 3, but with the result set at the center of the window.

>>> print(a.rolling(3, center=True).count())
0    2
1    3
2    2
3    2
4    1 dtype: int64

Rolling max with variable window size specified by an offset; only valid for datetime index.

>>> a = cudf.Series(
...     [1, 9, 5, 4, np.nan, 1],
...     index=[
...         pd.Timestamp('20190101 09:00:00'),
...         pd.Timestamp('20190101 09:00:01'),
...         pd.Timestamp('20190101 09:00:02'),
...         pd.Timestamp('20190101 09:00:04'),
...         pd.Timestamp('20190101 09:00:07'),
...         pd.Timestamp('20190101 09:00:08')
...     ]
... )
>>> print(a.rolling('2s').max())
2019-01-01T09:00:00.000    1
2019-01-01T09:00:01.000    9
2019-01-01T09:00:02.000    9
2019-01-01T09:00:04.000    4
2019-01-01T09:00:07.000
2019-01-01T09:00:08.000    1
dtype: int64

Apply custom function on the window with the apply method

>>> import numpy as np
>>> import math
>>> b = cudf.Series([16, 25, 36, 49, 64, 81], dtype=np.float64)
>>> def some_func(A):
...     b = 0
...     for a in A:
...         b = b + math.sqrt(a)
...     return b
...
>>> print(b.rolling(3, min_periods=1).apply(some_func))
0     4.0
1     9.0
2    15.0
3    18.0
4    21.0
5    24.0
dtype: float64

And this also works for window rolling set by an offset

>>> import pandas as pd
>>> c = cudf.Series(
...     [16, 25, 36, 49, 64, 81],
...     index=[
...          pd.Timestamp('20190101 09:00:00'),
...          pd.Timestamp('20190101 09:00:01'),
...          pd.Timestamp('20190101 09:00:02'),
...          pd.Timestamp('20190101 09:00:04'),
...          pd.Timestamp('20190101 09:00:07'),
...          pd.Timestamp('20190101 09:00:08')
...      ],
...     dtype=np.float64
... )
>>> print(c.rolling('2s').apply(some_func))
2019-01-01T09:00:00.000     4.0
2019-01-01T09:00:01.000     9.0
2019-01-01T09:00:02.000    11.0
2019-01-01T09:00:04.000     7.0
2019-01-01T09:00:07.000     8.0
2019-01-01T09:00:08.000    17.0
dtype: float64

Methods

apply(func, *args, **kwargs)

Counterpart of pandas.core.window.Rolling.apply.

count

max

mean

min

sum

apply(func, *args, **kwargs)

Counterpart of pandas.core.window.Rolling.apply.

Parameters
funcfunction

A user defined function that takes an 1D array as input

See also

cudf.core.series.Series.applymap

Apply an elementwise function to transform the values in the Column.

Notes

See notes of the cudf.core.series.Series.applymap()

General utility functions

cudf.testing.testing.assert_column_equal(left, right, check_dtype=True, check_column_type='equiv', check_less_precise=False, check_exact=False, check_datetimelike_compat=False, check_categorical=True, check_category_order=True, rtol=1e-05, atol=1e-08, obj='ColumnBase')

Check that left and right columns are equal

This function is intended to compare two columns and output any differences. Additional parameters allow varying the strictness of the equality checks performed.

Parameters
leftColumn

left Column to compare

rightColumn

right Column to compare

check_dtypebool, default True

Whether to check the Column dtype is identical.

check_column_typebool or {‘equiv’}, default ‘equiv’

Whether to check the columns class, dtype and inferred_type are identical. Currently it is idle, and similar to pandas.

check_less_precisebool or int, default False

Not yet supported

check_exactbool, default False

Whether to compare number exactly.

check_datetime_like_compatbool, default False

Compare datetime-like which is comparable ignoring dtype.

check_categoricalbool, default True

Whether to compare internal Categorical exactly.

check_category_orderbool, default True

Whether to compare category order of internal Categoricals

rtolfloat, default 1e-5

Relative tolerance. Only used when check_exact is False.

atolfloat, default 1e-8

Absolute tolerance. Only used when check_exact is False.

objstr, default ‘ColumnBase’

Specify object name being compared, internally used to show appropriate assertion message.

cudf.testing.testing.assert_frame_equal(left, right, check_dtype=True, check_index_type='equiv', check_column_type='equiv', check_frame_type=True, check_names=True, by_blocks=False, check_exact=False, check_datetimelike_compat=False, check_categorical=True, check_like=False, rtol=1e-05, atol=1e-08, obj='DataFrame')

Check that left and right DataFrame are equal

This function is intended to compare two DataFrame and output any differences. Additional parameters allow varying the strictness of the equality checks performed.

Parameters
leftDataFrame

left DataFrame to compare

rightDataFrame

right DataFrame to compare

check_dtypebool, default True

Whether to check the DataFrame dtype is identical.

check_index_typebool or {‘equiv’}, default ‘equiv’

Whether to check the Index class, dtype and inferred_type are identical.

check_column_typebool, default True

Whether to check the column class, dtype and inferred_type are identical. Currently it is idle, and similar to pandas.

check_frame_typebool, default True

Whether to check the DataFrame class is identical.

check_namesbool, default True

Whether to check that the names attribute for both the index and column attributes of the DataFrame is identical.

check_exactbool, default False

Whether to compare number exactly.

by_blocksbool, default False

Not supported

check_exactbool, default False

Whether to compare number exactly.

check_datetime_like_compatbool, default False

Compare datetime-like which is comparable ignoring dtype.

check_categoricalbool, default True

Whether to compare internal Categorical exactly.

check_likebool, default False

If True, ignore the order of index & columns. Note: index labels must match their respective rows (same as in columns) - same labels must be with the same data.

rtolfloat, default 1e-5

Relative tolerance. Only used when check_exact is False.

atolfloat, default 1e-8

Absolute tolerance. Only used when check_exact is False.

objstr, default ‘DataFrame’

Specify object name being compared, internally used to show appropriate assertion message.

Examples

>>> import cudf
>>> df1 = cudf.DataFrame({"a":[1, 2], "b":[1.0, 2.0]}, index=[1, 2])
>>> df2 = cudf.DataFrame({"a":[1, 2], "b":[1.0, 2.0]}, index=[2, 3])
>>> cudf.testing.assert_frame_equal(df1, df2)
......
......
AssertionError: ColumnBase are different

values are different (100.0 %)
[left]:  [1 2]
[right]: [2 3]
>>> df2 = cudf.DataFrame({"a":[1, 2], "c":[1.0, 2.0]}, index=[1, 2])
>>> cudf.testing.assert_frame_equal(df1, df2)
......
......
AssertionError: DataFrame.columns are different

DataFrame.columns values are different (50.0 %)
[left]: Index(['a', 'b'], dtype='object')
right]: Index(['a', 'c'], dtype='object')
>>> df2 = cudf.DataFrame({"a":[1, 2], "b":[1.0, 3.0]}, index=[1, 2])
>>> cudf.testing.assert_frame_equal(df1, df2)
......
......
AssertionError: Column name="b" are different

values are different (50.0 %)
[left]:  [1. 2.]
[right]: [1. 3.]

This will pass without any hitch:

>>> df2 = cudf.DataFrame({"a":[1, 2], "b":[1.0, 2.0]}, index=[1, 2])
>>> cudf.testing.assert_frame_equal(df1, df2)
cudf.testing.testing.assert_index_equal(left, right, exact='equiv', check_names: bool = True, check_less_precise: Union[bool, int] = False, check_exact: bool = True, check_categorical: bool = True, check_order: bool = True, rtol: float = 1e-05, atol: float = 1e-08, obj: str = 'Index')

Check that left and right Index are equal

This function is intended to compare two Index and output any differences. Additional parameters allow varying the strictness of the equality checks performed.

Parameters
leftIndex

left Index to compare

rightIndex

right Index to compare

exactbool or {‘equiv’}, default ‘equiv’

Whether to check the Index class, dtype and inferred_type are identical. If ‘equiv’, then RangeIndex can be substituted for Int8Index, Int16Index, Int32Index, Int64Index as well.

check_namesbool, default True

Whether to check the names attribute.

check_less_precisebool or int, default False

Not yet supported

check_exactbool, default False

Whether to compare number exactly.

check_categoricalbool, default True

Whether to compare internal Categorical exactly.

check_orderbool, default True

Whether to compare the order of index entries as well as their values. If True, both indexes must contain the same elements, in the same order. If False, both indexes must contain the same elements, but in any order.

rtolfloat, default 1e-5

Relative tolerance. Only used when check_exact is False.

atolfloat, default 1e-8

Absolute tolerance. Only used when check_exact is False.

objstr, default ‘Index’

Specify object name being compared, internally used to show appropriate assertion message.

Examples

>>> import cudf
>>> id1 = cudf.Index([1, 2, 3, 4])
>>> id2 = cudf.Index([1, 2, 3, 5])
>>> cudf.testing.assert_index_equal(id1, id2)
......
......
AssertionError: ColumnBase are different

values are different (25.0 %)
[left]:  [1 2 3 4]
[right]: [1 2 3 5]
>>> id2 = cudf.Index([1, 2, 3, 4], name="b")
>>> cudf.testing.assert_index_equal(id1, id2)
......
......
AssertionError: Index are different

name mismatch
[left]:  a
[right]: b

This will pass without any hitch:

>>> id2 = cudf.Index([1, 2, 3, 4], name="a")
>>> cudf.testing.assert_index_equal(id1, id2)
cudf.testing.testing.assert_series_equal(left, right, check_dtype=True, check_index_type='equiv', check_series_type=True, check_less_precise=False, check_names=True, check_exact=False, check_datetimelike_compat=False, check_categorical=True, check_category_order=True, rtol=1e-05, atol=1e-08, obj='Series')

Check that left and right Series are equal

This function is intended to compare two Series and output any differences. Additional parameters allow varying the strictness of the equality checks performed.

Parameters
leftSeries

left Series to compare

rightSeries

right Series to compare

check_dtypebool, default True

Whether to check the Series dtype is identical.

check_index_typebool or {‘equiv’}, default ‘equiv’

Whether to check the Index class, dtype and inferred_type are identical.

check_series_typebool, default True

Whether to check the seires class, dtype and inferred_type are identical. Currently it is idle, and similar to pandas.

check_less_precisebool or int, default False

Not yet supported

check_namesbool, default True

Whether to check that the names attribute for both the index and column attributes of the Series is identical.

check_exactbool, default False

Whether to compare number exactly.

check_datetime_like_compatbool, default False

Compare datetime-like which is comparable ignoring dtype.

check_categoricalbool, default True

Whether to compare internal Categorical exactly.

check_category_orderbool, default True

Whether to compare category order of internal Categoricals

rtolfloat, default 1e-5

Relative tolerance. Only used when check_exact is False.

atolfloat, default 1e-8

Absolute tolerance. Only used when check_exact is False.

objstr, default ‘Series’

Specify object name being compared, internally used to show appropriate assertion message.

Examples

>>> import cudf
>>> sr1 = cudf.Series([1, 2, 3, 4], name="a")
>>> sr2 = cudf.Series([1, 2, 3, 5], name="b")
>>> cudf.testing.assert_series_equal(sr1, sr2)
......
......
AssertionError: ColumnBase are different

values are different (25.0 %)
[left]:  [1 2 3 4]
[right]: [1 2 3 5]
>>> sr2 = cudf.Series([1, 2, 3, 4], name="b")
>>> cudf.testing.assert_series_equal(sr1, sr2)
......
......
AssertionError: Series are different

name mismatch
[left]:  a
[right]: b

This will pass without any hitch:

>>> sr2 = cudf.Series([1, 2, 3, 4], name="a")
>>> cudf.testing.assert_series_equal(sr1, sr2)

Timedelta Properties

class cudf.core.series.TimedeltaProperties(series)

Accessor object for timedeltalike properties of the Series values.

Returns
Returns a Series indexed like the original Series.

Examples

>>> import cudf
>>> seconds_series = cudf.Series([1, 2, 3], dtype='timedelta64[s]')
>>> seconds_series
0    00:00:01
1    00:00:02
2    00:00:03
dtype: timedelta64[s]
>>> seconds_series.dt.seconds
0    1
1    2
2    3
dtype: int64
>>> series = cudf.Series([12231312123, 1231231231, 1123236768712, 2135656,
...     3244334234], dtype='timedelta64[ms]')
>>> series
0      141 days 13:35:12.123
1       14 days 06:00:31.231
2    13000 days 10:12:48.712
3        0 days 00:35:35.656
4       37 days 13:12:14.234
dtype: timedelta64[ms]
>>> series.dt.components
    days  hours  minutes  seconds  milliseconds  microseconds  nanoseconds
0    141     13       35       12           123             0            0
1     14      6        0       31           231             0            0
2  13000     10       12       48           712             0            0
3      0      0       35       35           656             0            0
4     37     13       12       14           234             0            0
>>> series.dt.days
0      141
1       14
2    13000
3        0
4       37
dtype: int64
>>> series.dt.seconds
0    48912
1    21631
2    36768
3     2135
4    47534
dtype: int64
>>> series.dt.microseconds
0    123000
1    231000
2    712000
3    656000
4    234000
dtype: int64
>>> s.dt.nanoseconds
0    0
1    0
2    0
3    0
4    0
dtype: int64
Attributes
components

Return a Dataframe of the components of the Timedeltas.

days

Number of days.

microseconds

Number of microseconds (>= 0 and less than 1 second).

nanoseconds

Return the number of nanoseconds (n), where 0 <= n < 1 microsecond.

seconds

Number of seconds (>= 0 and less than 1 day).

property components

Return a Dataframe of the components of the Timedeltas.

Returns
DataFrame

Examples

>>> s = cudf.Series([12231312123, 1231231231, 1123236768712, 2135656, 3244334234], dtype='timedelta64[ms]')
>>> s
0      141 days 13:35:12.123
1       14 days 06:00:31.231
2    13000 days 10:12:48.712
3        0 days 00:35:35.656
4       37 days 13:12:14.234
dtype: timedelta64[ms]
>>> s.dt.components
    days  hours  minutes  seconds  milliseconds  microseconds  nanoseconds
0    141     13       35       12           123             0            0
1     14      6        0       31           231             0            0
2  13000     10       12       48           712             0            0
3      0      0       35       35           656             0            0
4     37     13       12       14           234             0            0
property days

Number of days.

Returns
Series

Examples

>>> import cudf
>>> s = cudf.Series([12231312123, 1231231231, 1123236768712, 2135656,
...     3244334234], dtype='timedelta64[ms]')
>>> s
0      141 days 13:35:12.123
1       14 days 06:00:31.231
2    13000 days 10:12:48.712
3        0 days 00:35:35.656
4       37 days 13:12:14.234
dtype: timedelta64[ms]
>>> s.dt.days
0      141
1       14
2    13000
3        0
4       37
dtype: int64
property microseconds

Number of microseconds (>= 0 and less than 1 second).

Returns
Series

Examples

>>> import cudf
>>> s = cudf.Series([12231312123, 1231231231, 1123236768712, 2135656,
...     3244334234], dtype='timedelta64[ms]')
>>> s
0      141 days 13:35:12.123
1       14 days 06:00:31.231
2    13000 days 10:12:48.712
3        0 days 00:35:35.656
4       37 days 13:12:14.234
dtype: timedelta64[ms]
>>> s.dt.microseconds
0    123000
1    231000
2    712000
3    656000
4    234000
dtype: int64
property nanoseconds

Return the number of nanoseconds (n), where 0 <= n < 1 microsecond.

Returns
Series

Examples

>>> import cudf
>>> s = cudf.Series([12231312123, 1231231231, 1123236768712, 2135656,
...     3244334234], dtype='timedelta64[ns]')
>>> s
0    00:00:12.231312123
1    00:00:01.231231231
2    00:18:43.236768712
3    00:00:00.002135656
4    00:00:03.244334234
dtype: timedelta64[ns]
>>> s.dt.nanoseconds
0    123
1    231
2    712
3    656
4    234
dtype: int64
property seconds

Number of seconds (>= 0 and less than 1 day).

Returns
Series

Examples

>>> import cudf
>>> s = cudf.Series([12231312123, 1231231231, 1123236768712, 2135656,
...     3244334234], dtype='timedelta64[ms]')
>>> s
0      141 days 13:35:12.123
1       14 days 06:00:31.231
2    13000 days 10:12:48.712
3        0 days 00:35:35.656
4       37 days 13:12:14.234
dtype: timedelta64[ms]
>>> s.dt.seconds
0    48912
1    21631
2    36768
3     2135
4    47534
dtype: int64
>>> s.dt.microseconds
0    123000
1    231000
2    712000
3    656000
4    234000
dtype: int64

Datetime Properties

class cudf.core.series.DatetimeProperties(series)

Accessor object for datetimelike properties of the Series values.

Returns
Returns a Series indexed like the original Series.

Examples

>>> import cudf
>>> import pandas as pd
>>> seconds_series = cudf.Series(pd.date_range("2000-01-01", periods=3,
...     freq="s"))
>>> seconds_series
0   2000-01-01 00:00:00
1   2000-01-01 00:00:01
2   2000-01-01 00:00:02
dtype: datetime64[ns]
>>> seconds_series.dt.second
0    0
1    1
2    2
dtype: int16
>>> hours_series = cudf.Series(pd.date_range("2000-01-01", periods=3,
...     freq="h"))
>>> hours_series
0   2000-01-01 00:00:00
1   2000-01-01 01:00:00
2   2000-01-01 02:00:00
dtype: datetime64[ns]
>>> hours_series.dt.hour
0    0
1    1
2    2
dtype: int16
>>> weekday_series = cudf.Series(pd.date_range("2000-01-01", periods=3,
...     freq="q"))
>>> weekday_series
0   2000-03-31
1   2000-06-30
2   2000-09-30
dtype: datetime64[ns]
>>> weekday_series.dt.weekday
0    4
1    4
2    5
dtype: int16
Attributes
day

The day of the datetime.

dayofweek

The day of the week with Monday=0, Sunday=6.

hour

The hours of the datetime.

minute

The minutes of the datetime.

month

The month as January=1, December=12.

second

The seconds of the datetime.

weekday

The day of the week with Monday=0, Sunday=6.

year

The year of the datetime.

Methods

strftime(date_format, *args, **kwargs)

Convert to Series using specified date_format.

property day

The day of the datetime.

Examples

>>> import pandas as pd
>>> import cudf
>>> datetime_series = cudf.Series(pd.date_range("2000-01-01",
...         periods=3, freq="D"))
>>> datetime_series
0   2000-01-01
1   2000-01-02
2   2000-01-03
dtype: datetime64[ns]
>>> datetime_series.dt.day
0    1
1    2
2    3
dtype: int16
property dayofweek

The day of the week with Monday=0, Sunday=6.

Examples

>>> import pandas as pd
>>> import cudf
>>> datetime_series = cudf.Series(pd.date_range('2016-12-31',
...     '2017-01-08', freq='D'))
>>> datetime_series
0   2016-12-31
1   2017-01-01
2   2017-01-02
3   2017-01-03
4   2017-01-04
5   2017-01-05
6   2017-01-06
7   2017-01-07
8   2017-01-08
dtype: datetime64[ns]
>>> datetime_series.dt.dayofweek
0    5
1    6
2    0
3    1
4    2
5    3
6    4
7    5
8    6
dtype: int16
property hour

The hours of the datetime.

Examples

>>> import pandas as pd
>>> import cudf
>>> datetime_series = cudf.Series(pd.date_range("2000-01-01",
...         periods=3, freq="h"))
>>> datetime_series
0   2000-01-01 00:00:00
1   2000-01-01 01:00:00
2   2000-01-01 02:00:00
dtype: datetime64[ns]
>>> datetime_series.dt.hour
0    0
1    1
2    2
dtype: int16
property minute

The minutes of the datetime.

Examples

>>> import pandas as pd
>>> import cudf
>>> datetime_series = cudf.Series(pd.date_range("2000-01-01",
...         periods=3, freq="T"))
>>> datetime_series
0   2000-01-01 00:00:00
1   2000-01-01 00:01:00
2   2000-01-01 00:02:00
dtype: datetime64[ns]
>>> datetime_series.dt.minute
0    0
1    1
2    2
dtype: int16
property month

The month as January=1, December=12.

Examples

>>> import pandas as pd
>>> import cudf
>>> datetime_series = cudf.Series(pd.date_range("2000-01-01",
...         periods=3, freq="M"))
>>> datetime_series
0   2000-01-31
1   2000-02-29
2   2000-03-31
dtype: datetime64[ns]
>>> datetime_series.dt.month
0    1
1    2
2    3
dtype: int16
property second

The seconds of the datetime.

Examples

>>> import pandas as pd
>>> import cudf
>>> datetime_series = cudf.Series(pd.date_range("2000-01-01",
...         periods=3, freq="s"))
>>> datetime_series
0   2000-01-01 00:00:00
1   2000-01-01 00:00:01
2   2000-01-01 00:00:02
dtype: datetime64[ns]
>>> datetime_series.dt.second
0    0
1    1
2    2
dtype: int16
strftime(date_format, *args, **kwargs)

Convert to Series using specified date_format.

Return a Series of formatted strings specified by date_format, which supports the same string format as the python standard library. Details of the string format can be found in python string format doc.

Parameters
date_formatstr

Date format string (e.g. “%Y-%m-%d”).

Returns
Series

Series of formatted strings.

Notes

The following date format identifiers are not yet supported: %a, %A, %w, %b, %B, %U, %W, %c, %x, %X, %G, %u, %V

Examples

>>> import cudf
>>> import pandas as pd
>>> weekday_series = cudf.Series(pd.date_range("2000-01-01", periods=3,
...      freq="q"))
>>> weekday_series.dt.strftime("%Y-%m-%d")
>>> weekday_series
0   2000-03-31
1   2000-06-30
2   2000-09-30
dtype: datetime64[ns]
0    2000-03-31
1    2000-06-30
2    2000-09-30
dtype: object
>>> weekday_series.dt.strftime("%Y %d %m")
0    2000 31 03
1    2000 30 06
2    2000 30 09
dtype: object
>>> weekday_series.dt.strftime("%Y / %d / %m")
0    2000 / 31 / 03
1    2000 / 30 / 06
2    2000 / 30 / 09
dtype: object
property weekday

The day of the week with Monday=0, Sunday=6.

Examples

>>> import pandas as pd
>>> import cudf
>>> datetime_series = cudf.Series(pd.date_range('2016-12-31',
...     '2017-01-08', freq='D'))
>>> datetime_series
0   2016-12-31
1   2017-01-01
2   2017-01-02
3   2017-01-03
4   2017-01-04
5   2017-01-05
6   2017-01-06
7   2017-01-07
8   2017-01-08
dtype: datetime64[ns]
>>> datetime_series.dt.weekday
0    5
1    6
2    0
3    1
4    2
5    3
6    4
7    5
8    6
dtype: int16
property year

The year of the datetime.

Examples

>>> import cudf
>>> import pandas as pd
>>> datetime_series = cudf.Series(pd.date_range("2000-01-01",
...         periods=3, freq="Y"))
>>> datetime_series
0   2000-12-31
1   2001-12-31
2   2002-12-31
dtype: datetime64[ns]
>>> datetime_series.dt.year
0    2000
1    2001
2    2002
dtype: int16

IO

cudf.io.csv.read_csv(filepath_or_buffer, lineterminator='\n', quotechar='"', quoting=0, doublequote=True, header='infer', mangle_dupe_cols=True, usecols=None, sep=',', delimiter=None, delim_whitespace=False, skipinitialspace=False, names=None, dtype=None, skipfooter=0, skiprows=0, dayfirst=False, compression='infer', thousands=None, decimal='.', true_values=None, false_values=None, nrows=None, byte_range=None, skip_blank_lines=True, parse_dates=None, comment=None, na_values=None, keep_default_na=True, na_filter=True, prefix=None, index_col=None, **kwargs)

Load a comma-seperated-values (CSV) dataset into a DataFrame

Parameters
filepath_or_bufferstr, path object, or file-like object

Either a path to a file (a str, pathlib.Path, or py._path.local.LocalPath), URL (including http, ftp, and S3 locations), or any object with a read() method (such as builtin open() file handler function or StringIO).

sepchar, default ‘,’

Delimiter to be used.

delimiterchar, default None

Alternative argument name for sep.

delim_whitespacebool, default False

Determines whether to use whitespace as delimiter.

lineterminatorchar, default ‘n’

Character to indicate end of line.

skipinitialspacebool, default False

Skip spaces after delimiter.

nameslist of str, default None

List of column names to be used.

dtypetype, str, list of types, or dict of column -> type, default None

Data type(s) for data or columns. If dtype is a type/str, all columns are mapped to the particular type passed. If list, types are applied in the same order as the column names. If dict, types are mapped to the column names. E.g. {‘a’: np.float64, ‘b’: int32, ‘c’: ‘float’} If None, dtypes are inferred from the dataset. Use str to preserve data and not infer or interpret to dtype.

quotecharchar, default ‘”’

Character to indicate start and end of quote item.

quotingstr or int, default 0

Controls quoting behavior. Set to one of 0 (csv.QUOTE_MINIMAL), 1 (csv.QUOTE_ALL), 2 (csv.QUOTE_NONNUMERIC) or 3 (csv.QUOTE_NONE). Quoting is enabled with all values except 3.

doublequotebool, default True

When quoting is enabled, indicates whether to interpret two consecutive quotechar inside fields as single quotechar

headerint, default ‘infer’

Row number to use as the column names. Default behavior is to infer the column names: if no names are passed, header=0; if column names are passed explicitly, header=None.

usecolslist of int or str, default None

Returns subset of the columns given in the list. All elements must be either integer indices (column number) or strings that correspond to column names

mangle_dupe_colsboolean, default True

Duplicate columns will be specified as ‘X’,’X.1’,…’X.N’.

skiprowsint, default 0

Number of rows to be skipped from the start of file.

skipfooterint, default 0

Number of rows to be skipped at the bottom of file.

compression{‘infer’, ‘gzip’, ‘zip’, None}, default ‘infer’

For on-the-fly decompression of on-disk data. If ‘infer’, then detect compression from the following extensions: ‘.gz’,‘.zip’ (otherwise no decompression). If using ‘zip’, the ZIP file must contain only one data file to be read in, otherwise the first non-zero-sized file will be used. Set to None for no decompression.

decimalchar, default ‘.’

Character used as a decimal point.

thousandschar, default None

Character used as a thousands delimiter.

true_valueslist, default None

Values to consider as boolean True

false_valueslist, default None

Values to consider as boolean False

nrowsint, default None

If specified, maximum number of rows to read

byte_rangelist or tuple, default None

Byte range within the input file to be read. The first number is the offset in bytes, the second number is the range size in bytes. Set the size to zero to read all data after the offset location. Reads the row that starts before or at the end of the range, even if it ends after the end of the range.

skip_blank_linesbool, default True

If True, discard and do not parse empty lines If False, interpret empty lines as NaN values

parse_dateslist of int or names, default None

If list of columns, then attempt to parse each entry as a date. Columns may not always be recognized as dates, for instance due to unusual or non-standard formats. To guarantee a date and increase parsing speed, explicitly specify dtype=’date’ for the desired columns.

commentchar, default None

Character used as a comments indicator. If found at the beginning of a line, the line will be ignored altogether.

na_valuesscalar, str, or list-like, optional

Additional strings to recognize as nulls. By default the following values are interpreted as nulls: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘<NA>’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’.

keep_default_nabool, default True

Whether or not to include the default NA values when parsing the data.

na_filterbool, default True

Detect missing values (empty strings and the values in na_values). Passing False can improve performance.

prefixstr, default None

Prefix to add to column numbers when parsing without a header row

index_colint, string or False, default None

Column to use as the row labels of the DataFrame. Passing index_col=False explicitly disables index column inference and discards the last column.

Returns
GPU DataFrame object.

Notes

  • cuDF supports local and remote data stores. See configuration details for available sources here.

Examples

Create a test csv file

>>> import cudf
>>> filename = 'foo.csv'
>>> lines = [
...   "num1,datetime,text",
...   "123,2018-11-13T12:00:00,abc",
...   "456,2018-11-14T12:35:01,def",
...   "789,2018-11-15T18:02:59,ghi"
... ]
>>> with open(filename, 'w') as fp:
...     fp.write('\n'.join(lines)+'\n')

Read the file with cudf.read_csv

>>> cudf.read_csv(filename)
  num1                datetime text
0  123 2018-11-13T12:00:00.000 5451
1  456 2018-11-14T12:35:01.000 5784
2  789 2018-11-15T18:02:59.000 6117
cudf.io.csv.to_csv(df, path_or_buf=None, sep=',', na_rep='', columns=None, header=True, index=True, line_terminator='\n', chunksize=None, encoding=None, compression=None, **kwargs)

Write a dataframe to csv file format.

Parameters
dfDataFrame

DataFrame object to be written to csv

path_or_bufstr or file handle, default None

File path or object, if None is provided the result is returned as a string.

sepchar, default ‘,’

Delimiter to be used.

na_repstr, default ‘’

String to use for null entries

columnslist of str, optional

Columns to write

headerbool, default True

Write out the column names

indexbool, default True

Write out the index as a column

line_terminatorchar, default ‘n’
chunksizeint or None, default None

Rows to write at a time

encoding: str, default ‘utf-8’

A string representing the encoding to use in the output file Only ‘utf-8’ is currently supported

compression: str, None

A string representing the compression scheme to use in the the output file Compression while writing csv is not supported currently

Returns
——-
None or str

If path_or_buf is None, returns the resulting csv format as a string. Otherwise returns None.

Notes

  • Follows the standard of Pandas csv.QUOTE_NONNUMERIC for all output.

  • If to_csv leads to memory errors consider setting the chunksize argument.

Examples

Write a dataframe to csv.

>>> import cudf
>>> filename = 'foo.csv'
>>> df = cudf.DataFrame({'x': [0, 1, 2, 3],
                         'y': [1.0, 3.3, 2.2, 4.4],
                         'z': ['a', 'b', 'c', 'd']})
>>> df = df.set_index([3, 2, 1, 0])
>>> df.to_csv(filename)
cudf.io.parquet.merge_parquet_filemetadata(filemetadata_list)

Merge multiple parquet metadata blobs

Parameters
metadata_listlist

List of buffers returned by to_parquet

Returns
Combined parquet metadata blob
cudf.io.parquet.read_parquet(filepath_or_buffer, engine='cudf', columns=None, filters=None, row_groups=None, skiprows=None, num_rows=None, strings_to_categorical=False, use_pandas_metadata=True, *args, **kwargs)

Load a Parquet dataset into a DataFrame

Parameters
filepath_or_bufferstr, path object, bytes, file-like object, or a list

of such objects. Contains one or more of the following: either a path to a file (a str, pathlib.Path, or py._path.local.LocalPath), URL (including http, ftp, and S3 locations), Python bytes of raw binary data, or any object with a read() method (such as builtin open() file handler function or BytesIO).

engine{ ‘cudf’, ‘pyarrow’ }, default ‘cudf’

Parser engine to use.

columnslist, default None

If not None, only these columns will be read.

filterslist of tuple, list of lists of tuples default None

If not None, specifies a filter predicate used to filter out row groups using statistics stored for each row group as Parquet metadata. Row groups that do not match the given filter predicate are not read. The predicate is expressed in disjunctive normal form (DNF) like [[(‘x’, ‘=’, 0), …], …]. DNF allows arbitrary boolean logical combinations of single column predicates. The innermost tuples each describe a single column predicate. The list of inner predicates is interpreted as a conjunction (AND), forming a more selective and multiple column predicate. Finally, the most outer list combines these filters as a disjunction (OR). Predicates may also be passed as a list of tuples. This form is interpreted as a single conjunction. To express OR in predicates, one must use the (preferred) notation of list of lists of tuples.

row_groupsint, or list, or a list of lists default None

If not None, specifies, for each input file, which row groups to read. If reading multiple inputs, a list of lists should be passed, one list for each input.

skiprowsint, default None

If not None, the number of rows to skip from the start of the file.

num_rowsint, default None

If not None, the total number of rows to read.

strings_to_categoricalboolean, default False

If True, return string columns as GDF_CATEGORY dtype; if False, return a as GDF_STRING dtype.

use_pandas_metadataboolean, default True

If True and dataset has custom PANDAS schema metadata, ensure that index columns are also loaded.

Returns
DataFrame

Notes

  • cuDF supports local and remote data stores. See configuration details for available sources here.

Examples

>>> import cudf
>>> df = cudf.read_parquet(filename)
>>> df
  num1                datetime text
0  123 2018-11-13T12:00:00.000 5451
1  456 2018-11-14T12:35:01.000 5784
2  789 2018-11-15T18:02:59.000 6117
cudf.io.parquet.read_parquet_metadata(path)

Read a Parquet file’s metadata and schema

Parameters
pathstring or path object

Path of file to be read

Returns
Total number of rows
Number of row groups
List of column names

Examples

>>> import cudf
>>> num_rows, num_row_groups, names = cudf.io.read_parquet_metadata(filename)
>>> df = [cudf.read_parquet(fname, row_group=i) for i in range(row_groups)]
>>> df = cudf.concat(df)
>>> df
  num1                datetime text
0  123 2018-11-13T12:00:00.000 5451
1  456 2018-11-14T12:35:01.000 5784
2  789 2018-11-15T18:02:59.000 6117
cudf.io.parquet.to_parquet(df, path, engine='cudf', compression='snappy', index=None, partition_cols=None, partition_file_name=None, statistics='ROWGROUP', metadata_file_path=None, int96_timestamps=False, *args, **kwargs)

Write a DataFrame to the parquet format.

Parameters
pathstr

File path or Root Directory path. Will be used as Root Directory path while writing a partitioned dataset.

compression{‘snappy’, None}, default ‘snappy’

Name of the compression to use. Use None for no compression.

indexbool, default None

If True, include the dataframe’s index(es) in the file output. If False, they will not be written to the file. If None, the engine’s default behavior will be used. However, instead of being saved as values, the RangeIndex will be stored as a range in the metadata so it doesn’t require much space and is faster. Other indexes will be included as columns in the file output.

partition_colslist, optional, default None

Column names by which to partition the dataset Columns are partitioned in the order they are given

partition_file_namestr, optional, default None

File name to use for partitioned datasets. Different partitions will be written to different directories, but all files will have this name. If nothing is specified, a random uuid4 hex string will be used for each file.

int96_timestampsbool, default False

If True, write timestamps in int96 format. This will convert timestamps from timestamp[ns], timestamp[ms], timestamp[s], and timestamp[us] to the int96 format, which is the number of Julian days and the number of nanoseconds since midnight. If False, timestamps will not be altered.

cudf.io.parquet.write_to_dataset(df, root_path, filename=None, partition_cols=None, fs=None, preserve_index=False, return_metadata=False, **kwargs)

Wraps to_parquet to write partitioned Parquet datasets. For each combination of partition group and value, subdirectories are created as follows:

root_dir/
    group=value1
        <filename>.parquet
    ...
    group=valueN
        <filename>.parquet
Parameters
dfcudf.DataFrame
root_pathstring,

The root directory of the dataset

filenamestring, default None

The file name to use (within each partition directory). If None, a random uuid4 hex string will be used for each file name.

fsFileSystem, default None

If nothing passed, paths assumed to be found in the local on-disk filesystem

preserve_indexbool, default False

Preserve index values in each parquet file.

partition_colslist,

Column names by which to partition the dataset Columns are partitioned in the order they are given

return_metadatabool, default False

Return parquet metadata for written data. Returned metadata will include the file-path metadata (relative to root_path).

**kwargsdict,

kwargs for to_parquet function.

cudf.io.orc.read_orc(filepath_or_buffer, engine='cudf', columns=None, filters=None, stripes=None, skiprows=None, num_rows=None, use_index=True, decimals_as_float=True, force_decimal_scale=None, timestamp_type=None, **kwargs)

Load an ORC dataset into a DataFrame

Parameters
filepath_or_bufferstr, path object, bytes, or file-like object

Either a path to a file (a str, pathlib.Path, or py._path.local.LocalPath), URL (including http, ftp, and S3 locations), Python bytes of raw binary data, or any object with a read() method (such as builtin open() file handler function or BytesIO).

engine{ ‘cudf’, ‘pyarrow’ }, default ‘cudf’

Parser engine to use.

columnslist, default None

If not None, only these columns will be read from the file.

filterslist of tuple, list of lists of tuples default None

If not None, specifies a filter predicate used to filter out row groups using statistics stored for each row group as Parquet metadata. Row groups that do not match the given filter predicate are not read. The predicate is expressed in disjunctive normal form (DNF) like [[(‘x’, ‘=’, 0), …], …]. DNF allows arbitrary boolean logical combinations of single column predicates. The innermost tuples each describe a single column predicate. The list of inner predicates is interpreted as a conjunction (AND), forming a more selective and multiple column predicate. Finally, the outermost list combines these filters as a disjunction (OR). Predicates may also be passed as a list of tuples. This form is interpreted as a single conjunction. To express OR in predicates, one must use the (preferred) notation of list of lists of tuples.

stripes: list, default None

If not None, only these stripe will be read from the file. Stripes are concatenated with index ignored.

skiprowsint, default None

If not None, the number of rows to skip from the start of the file.

num_rowsint, default None

If not None, the total number of rows to read.

use_indexbool, default True

If True, use row index if available for faster seeking.

kwargs are passed to the engine
Returns
DataFrame

Notes

  • cuDF supports local and remote data stores. See configuration details for available sources here.

Examples

>>> import cudf
>>> df = cudf.read_orc(filename)
>>> df
  num1                datetime text
0  123 2018-11-13T12:00:00.000 5451
1  456 2018-11-14T12:35:01.000 5784
2  789 2018-11-15T18:02:59.000 6117
cudf.io.orc.read_orc_metadata(path)

Read an ORC file’s metadata and schema

Parameters
pathstring or path object

Path of file to be read

Returns
Total number of rows
Number of stripes
List of column names

Examples

>>> import cudf
>>> num_rows, stripes, names = cudf.io.read_orc_metadata(filename)
>>> df = [cudf.read_orc(fname, stripes=i) for i in range(stripes)]
>>> df = cudf.concat(df)
>>> df
  num1                datetime text
0  123 2018-11-13T12:00:00.000 5451
1  456 2018-11-14T12:35:01.000 5784
2  789 2018-11-15T18:02:59.000 6117
cudf.io.orc.read_orc_statistics(filepath_or_buffer, columns=None, **kwargs)

Read an ORC file’s file-level and stripe-level statistics

Parameters
filepath_or_bufferstr, path object, bytes, or file-like object

Either a path to a file (a str, pathlib.Path, or py._path.local.LocalPath), URL (including http, ftp, and S3 locations), Python bytes of raw binary data, or any object with a read() method (such as builtin open() file handler function or BytesIO).

columnslist, default None

If not None, statistics for only these columns will be read from the file.

Returns
Statistics for each column of given file
Statistics for each column for each stripe of given file
cudf.io.orc.to_orc(df, fname, compression=None, enable_statistics=True, **kwargs)

Write a DataFrame to the ORC format.

Parameters
fnamestr

File path or object where the ORC dataset will be stored.

compression{{ ‘snappy’, None }}, default None

Name of the compression to use. Use None for no compression.

enable_statistics: boolean, default True

Enable writing column statistics.

cudf.io.json.read_json(path_or_buf, engine='auto', dtype=True, lines=False, compression='infer', byte_range=None, *args, **kwargs)

Load a JSON dataset into a DataFrame

Parameters
path_or_bufstr, path object, or file-like object

Either JSON data in a str, path to a file (a str, pathlib.Path, or py._path.local.LocalPath), URL (including http, ftp, and S3 locations), or any object with a read() method (such as builtin open() file handler function or StringIO).

engine{{ ‘auto’, ‘cudf’, ‘pandas’ }}, default ‘auto’

Parser engine to use. If ‘auto’ is passed, the engine will be automatically selected based on the other parameters.

orientstring,

Indication of expected JSON string format (pandas engine only). Compatible JSON strings can be produced by to_json() with a corresponding orient value. The set of possible orients is:

  • 'split' : dict like {index -> [index], columns -> [columns], data -> [values]}

  • 'records' : list like [{column -> value}, ... , {column -> value}]

  • 'index' : dict like {index -> {column -> value}}

  • 'columns' : dict like {column -> {index -> value}}

  • 'values' : just the values array

The allowed and default values depend on the value of the typ parameter.

  • when typ == 'series',

    • allowed orients are {'split','records','index'}

    • default is 'index'

    • The Series index must be unique for orient 'index'.

  • when typ == 'frame',

    • allowed orients are {'split','records','index', 'columns','values', 'table'}

    • default is 'columns'

    • The DataFrame index must be unique for orients 'index' and 'columns'.

    • The DataFrame columns must be unique for orients 'index', 'columns', and 'records'.

typtype of object to recover (series or frame), default ‘frame’

With cudf engine, only frame output is supported.

dtypeboolean or dict, default True

If True, infer dtypes, if a dict of column to dtype, then use those, if False, then don’t infer dtypes at all, applies only to the data.

convert_axesboolean, default True

Try to convert the axes to the proper dtypes (pandas engine only).

convert_datesboolean, default True

List of columns to parse for dates (pandas engine only); If True, then try to parse datelike columns default is True; a column label is datelike if

  • it ends with '_at',

  • it ends with '_time',

  • it begins with 'timestamp',

  • it is 'modified', or

  • it is 'date'

keep_default_datesboolean, default True

If parsing dates, parse the default datelike columns (pandas engine only)

numpyboolean, default False

Direct decoding to numpy arrays (pandas engine only). Supports numeric data only, but non-numeric column and index labels are supported. Note also that the JSON ordering MUST be the same for each term if numpy=True.

precise_floatboolean, default False

Set to enable usage of higher precision (strtod) function when decoding string to double values (pandas engine only). Default (False) is to use fast but less precise builtin functionality

date_unitstring, default None

The timestamp unit to detect if converting dates (pandas engine only). The default behavior is to try and detect the correct precision, but if this is not desired then pass one of ‘s’, ‘ms’, ‘us’ or ‘ns’ to force parsing only seconds, milliseconds, microseconds or nanoseconds.

encodingstr, default is ‘utf-8’

The encoding to use to decode py3 bytes. With cudf engine, only utf-8 is supported.

linesboolean, default False

Read the file as a json object per line.

chunksizeinteger, default None

Return JsonReader object for iteration (pandas engine only). See the line-delimited json docs for more information on chunksize. This can only be passed if lines=True. If this is None, the file will be read into memory all at once.

compression{‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default ‘infer’

For on-the-fly decompression of on-disk data. If ‘infer’, then use gzip, bz2, zip or xz if path_or_buf is a string ending in ‘.gz’, ‘.bz2’, ‘.zip’, or ‘xz’, respectively, and no decompression otherwise. If using ‘zip’, the ZIP file must contain only one data file to be read in. Set to None for no decompression.

byte_rangelist or tuple, default None

Byte range within the input file to be read (cudf engine only). The first number is the offset in bytes, the second number is the range size in bytes. Set the size to zero to read all data after the offset location. Reads the row that starts before or at the end of the range, even if it ends after the end of the range.

Returns
resultSeries or DataFrame, depending on the value of typ.
cudf.io.json.to_json(cudf_val, path_or_buf=None, *args, **kwargs)

Convert the cuDF object to a JSON string. Note nulls and NaNs will be converted to null and datetime objects will be converted to UNIX timestamps.

Parameters
path_or_bufstring or file handle, optional

File path or object. If not specified, the result is returned as a string.

orientstring

Indication of expected JSON string format.

  • Series
    • default is ‘index’

    • allowed values are: {‘split’,’records’,’index’,’table’}

  • DataFrame
    • default is ‘columns’

    • allowed values are: {‘split’,’records’,’index’,’columns’,’values’,’table’}

  • The format of the JSON string
    • ‘split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}

    • ‘records’ : list like [{column -> value}, … , {column -> value}]

    • ‘index’ : dict like {index -> {column -> value}}

    • ‘columns’ : dict like {column -> {index -> value}}

    • ‘values’ : just the values array

    • ‘table’ : dict like {‘schema’: {schema}, ‘data’: {data}} describing the data, and the data component is like orient='records'.

date_format{None, ‘epoch’, ‘iso’}

Type of date conversion. ‘epoch’ = epoch milliseconds, ‘iso’ = ISO8601. The default depends on the orient. For orient='table', the default is ‘iso’. For all other orients, the default is ‘epoch’.

double_precisionint, default 10

The number of decimal places to use when encoding floating point values.

force_asciibool, default True

Force encoded string to be ASCII.

date_unitstring, default ‘ms’ (milliseconds)

The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’, ‘ns’ for second, millisecond, microsecond, and nanosecond respectively.

default_handlercallable, default None

Handler to call if object cannot otherwise be converted to a suitable format for JSON. Should receive a single argument which is the object to convert and return a serializable object.

linesbool, default False

If ‘orient’ is ‘records’ write out line delimited json format. Will throw ValueError if incorrect ‘orient’ since others are not list like.

compression{‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}

A string representing the compression to use in the output file, only used when the first argument is a filename. By default, the compression is inferred from the filename.

indexbool, default True

Whether to include the index values in the JSON string. Not including the index (index=False) is only supported when orient is ‘split’ or ‘table’.

cudf.io.avro.read_avro(filepath_or_buffer, engine='cudf', columns=None, skiprows=None, num_rows=None, **kwargs)

Load an Avro dataset into a DataFrame

Parameters
filepath_or_bufferstr, path object, bytes, or file-like object

Either a path to a file (a str, pathlib.Path, or py._path.local.LocalPath), URL (including http, ftp, and S3 locations), Python bytes of raw binary data, or any object with a read() method (such as builtin open() file handler function or BytesIO).

engine[‘cudf’], default ‘cudf’

Parser engine to use.

columnslist, default None

If not None, only these columns will be read.

skiprowsint, default None

If not None, the number of rows to skip from the start of the file.

num_rowsint, default None

If not None, the total number of rows to read.

Returns
DataFrame

Notes

  • cuDF supports local and remote data stores. See configuration details for available sources here.

Examples

>>> import pandavro
>>> import pandas as pd
>>> import cudf
>>> pandas_df = pd.DataFrame()
>>> pandas_df['numbers'] = [10, 20, 30]
>>> pandas_df['text'] = ["hello", "rapids", "ai"]
>>> pandas_df
   numbers    text
0       10   hello
1       20  rapids
2       30      ai
>>> pandavro.to_avro("data.avro", pandas_df)
>>> cudf.read_avro("data.avro")
   numbers    text
0       10   hello
1       20  rapids
2       30      ai
cudf.io.dlpack.from_dlpack(pycapsule_obj)

Converts from a DLPack tensor to a cuDF object.

DLPack is an open-source memory tensor structure: dmlc/dlpack.

This function takes a PyCapsule object which contains a pointer to a DLPack tensor as input, and returns a cuDF object. This function deep copies the data in the DLPack tensor into a cuDF object.

Parameters
pycapsule_objPyCapsule

Input DLPack tensor pointer which is encapsulated in a PyCapsule object.

Returns
A cuDF DataFrame or Series depending on if the input DLPack tensor is 1D
or 2D.

Notes

cuDF from_dlpack() assumes column-major (Fortran order) input. If the input tensor is row-major, transpose it before passing it to this function.

cudf.io.dlpack.to_dlpack(cudf_obj)

Converts a cuDF object to a DLPack tensor.

DLPack is an open-source memory tensor structure: dmlc/dlpack.

This function takes a cuDF object as input, and returns a PyCapsule object which contains a pointer to DLPack tensor. This function deep copies the data in the cuDF object into the DLPack tensor.

Parameters
cudf_objcuDF Object

Input cuDF object.

Returns
A DLPack tensor pointer which is encapsulated in a PyCapsule object.

Notes

cuDF to_dlpack() produces column-major (Fortran order) output. If the output tensor needs to be row major, transpose the output of this function.

cudf.io.feather.read_feather(path, *args, **kwargs)

Load an feather object from the file path, returning a DataFrame.

Parameters
pathstring

File path

columnslist, default=None

If not None, only these columns will be read from the file.

Returns
DataFrame

Examples

>>> import cudf
>>> df = cudf.read_feather(filename)
>>> df
  num1                datetime text
0  123 2018-11-13T12:00:00.000 5451
1  456 2018-11-14T12:35:01.000 5784
2  789 2018-11-15T18:02:59.000 6117
cudf.io.feather.to_feather(df, path, *args, **kwargs)

Write a DataFrame to the feather format.

Parameters
pathstr

File path

cudf.io.hdf.read_hdf(path_or_buf, *args, **kwargs)

Read from the store, close it if we opened it.

Retrieve pandas object stored in file, optionally based on where criteria

Parameters
path_or_bufstring, buffer or path object

Path to the file to open, or an open HDFStore. object. Supports any object implementing the __fspath__ protocol. This includes pathlib.Path and py._path.local.LocalPath objects.

keyobject, optional

The group identifier in the store. Can be omitted if the HDF file contains a single pandas object.

mode{‘r’, ‘r+’, ‘a’}, optional

Mode to use when opening the file. Ignored if path_or_buf is a Pandas HDFS. Default is ‘r’.

wherelist, optional

A list of Term (or convertible) objects.

startint, optional

Row number to start selection.

stopint, optional

Row number to stop selection.

columnslist, optional

A list of columns names to return.

iteratorbool, optional

Return an iterator object.

chunksizeint, optional

Number of rows to include in an iteration when using an iterator.

errorsstr, default ‘strict’

Specifies how encoding and decoding errors are to be handled. See the errors argument for open() for a full list of options.

**kwargs

Additional keyword arguments passed to HDFStore.

Returns
itemobject

The selected object. Return type depends on the object stored.

See also

cudf.io.hdf.to_hdf

Write a HDF file from a DataFrame.

cudf.io.hdf.to_hdf(path_or_buf, key, value, *args, **kwargs)

Write the contained data to an HDF5 file using HDFStore.

Hierarchical Data Format (HDF) is self-describing, allowing an application to interpret the structure and contents of a file with no outside information. One HDF file can hold a mix of related objects which can be accessed as a group or as individual objects.

In order to add another DataFrame or Series to an existing HDF file please use append mode and a different a key.

For more information see the user guide.

Parameters
path_or_bufstr or pandas.HDFStore

File path or HDFStore object.

keystr

Identifier for the group in the store.

mode{‘a’, ‘w’, ‘r+’}, default ‘a’

Mode to open file:

  • ‘w’: write, a new file is created (an existing file with the same name would be deleted).

  • ‘a’: append, an existing file is opened for reading and writing, and if the file does not exist it is created.

  • ‘r+’: similar to ‘a’, but the file must already exist.

format{‘fixed’, ‘table’}, default ‘fixed’

Possible values:

  • ‘fixed’: Fixed format. Fast writing/reading. Not-appendable, nor searchable.

  • ‘table’: Table format. Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data.

appendbool, default False

For Table formats, append the input data to the existing.

data_columnslist of columns or True, optional

List of columns to create as indexed data columns for on-disk queries, or True to use all columns. By default only the axes of the object are indexed. See Query via Data Columns. Applicable only to format=’table’.

complevel{0-9}, optional

Specifies a compression level for data. A value of 0 disables compression.

complib{‘zlib’, ‘lzo’, ‘bzip2’, ‘blosc’}, default ‘zlib’

Specifies the compression library to be used. As of v0.20.2 these additional compressors for Blosc are supported (default if no compressor specified: ‘blosc:blosclz’): {‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’}. Specifying a compression library which is not available issues a ValueError.

fletcher32bool, default False

If applying compression use the fletcher32 checksum.

dropnabool, default False

If true, ALL nan rows will not be written to store.

errorsstr, default ‘strict’

Specifies how encoding and decoding errors are to be handled. See the errors argument for open() for a full list of options.

See also

cudf.io.hdf.read_hdf

Read from HDF file.

cudf.io.parquet.to_parquet

Write a DataFrame to the binary parquet format.

cudf.io.feather.to_feather

Write out feather-format for DataFrames.

Extending cuDF

cudf.api.extensions.accessor.register_dataframe_accessor(name)

Extends cudf.DataFrame with custom defined accessor

Parameters
namestr

The name to be registered in DataFrame for the custom accessor

Returns
decoratorcallable

Decorator function for accessor

Notes

The DataFrame object will be passed to your custom accessor upon first invocation. And will be cached for future calls.

If the data passed to your accessor is of wrong datatype, you should raise an AttributeError in consistent with other cudf methods.

Examples

In your library code:

>>> import cudf as gd
>>> @gd.api.extensions.register_dataframe_accessor("point")
... class PointsAccessor:
...     def __init__(self, obj):
...         self._validate(obj)
...         self._obj = obj
...     @staticmethod
...     def _validate(obj):
...         cols = obj.columns
...         if not all([vertex in cols for vertex in ["x", "y"]]):
...             raise AttributeError("Must have vertices 'x', 'y'.")
...     @property
...     def bounding_box(self):
...         xs, ys = self._obj["x"], self._obj["y"]
...         min_x, min_y = xs.min(), ys.min()
...         max_x, max_y = xs.max(), ys.max()
...         return (min_x, min_y, max_x, max_y)

Then in user code:

>>> df = gd.DataFrame({'x': [1,2,3,4,5,6], 'y':[7,6,5,4,3,2]})
>>> df.point.bounding_box
(1, 2, 6, 7)
cudf.api.extensions.accessor.register_index_accessor(name)

Extends cudf.Index with custom defined accessor

Parameters
namestr

The name to be registered in Index for the custom accessor

Returns
decoratorcallable

Decorator function for accessor

Notes

The Index object will be passed to your custom accessor upon first invocation. And will be cached for future calls.

If the data passed to your accessor is of wrong datatype, you should raise an AttributeError in consistent with other cudf methods.

Examples

In your library code:

>>> import cudf as gd
>>> @gd.api.extensions.register_index_accessor("odd")
... class OddRowAccessor:
...     def __init__(self, obj):
...         self._obj = obj
...     def __getitem__(self, i):
...         return self._obj[2 * i - 1]

Then in user code:

>>> gs = gd.Index(list(range(0, 50)))
>>> gs.odd[1]
1
>>> gs.odd[2]
3
>>> gs.odd[3]
5
cudf.api.extensions.accessor.register_series_accessor(name)

Extends cudf.Series with custom defined accessor

Parameters
namestr

The name to be registered in Series for the custom accessor

Returns
decoratorcallable

Decorator function for accessor

Notes

The Series object will be passed to your custom accessor upon first invocation. And will be cached for future calls.

If the data passed to your accessor is of wrong datatype, you should raise an AttributeError in consistent with other cudf methods.

Examples

In your library code:

>>> import cudf as gd
>>> @gd.api.extensions.register_series_accessor("odd")
... class OddRowAccessor:
...     def __init__(self, obj):
...         self._obj = obj
...     def __getitem__(self, i):
...         return self._obj[2 * i - 1]

Then in user code:

>>> gs = gd.Series(list(range(0, 50)))
>>> gs.odd[1]
1
>>> gs.odd[2]
3
>>> gs.odd[3]
5

GpuArrowReader

class cudf.comm.gpuarrow.GpuArrowReader(schema, dev_ary)

Methods

schema()

Return a pyarrow schema

to_dict()

Return a dictionary of Series object

schema()

Return a pyarrow schema

to_dict()

Return a dictionary of Series object